Simulated Humanist Mind

Games

Zipf’s Law, Style and the Literary – Introduction

As a scholar of literature, I often feel embarrassed by linguistic regularities. While the consistencies of language are the very thing that allow critical thought on literature to exist (in that they allow the critic to be assured that her readers will share at least some of the phenomenal experience of reading a given text) academic scholars are more likely to focus on the idiosyncratic moments of a work rather than its ‘low level’ entropy-thwarting communicative structure. These moments of aberration more readily stick in the reader’s craw, demanding an explanation for their presence among the more pedestrian-seeming connectives and conjunctions that make up the ‘glue’ of written text.  However, with the rise of techniques like topic modelling, the regularities of language are becoming a major focus of critical attention. This focus is not entirely novel. Texts from the critical canon of narratology, for instance, often focus on the regular as a matter of due course.   The paradigmatic example must be Roland Barthes’ classic study S/Z. In his minute analysis of the structure of Balzac’s short story Sarrasine, Barthes does not hesitate to direct his gaze to Balzac’s use of what Barthes calls the ‘action’ code — a voice that “[implies] a logic in human behavior.” 1  These actions structure the text in a way that offers a baseline regularity for the reader to grasp.  Yet even this attempt to grasp the quotidian aspects of a text remains too high level.  Barthes’ focus remains solidly on the level of narrative — a coarser grain than the verbs and connectives that produce the action seme itself. But where will we find any finer grain? As he often does, eminent Modernist scholar and avid Heathkit computer enthusiast Hugh Kenner provides us with a way forward.  Kenner had a magpie mind, and […]

Posted by Craig on January 23rd, 2015

-->

Zipf’s Law, Style, and the Literary — The Gory Details

(This is part 2 of my series on Zipf’s law and literary stylistics. See part one here) The Origins of Zipfian Regularity In a short article for Discover magazine entitled “The Untidy Desk and the Larger Order of Things” Hugh Kenner examines  Zipf’s Law in the context of literary works like T.S. Eliot’s The Wasteland and Henry James’s The Ambassadors.1 Kenner offers a succinct non-technical definition of the Zipf phenomenon by focusing on the 80-20 effect of such  systems — “the greater part of any activity [80%] draws on but a small fraction of resources [20%].”2 In linguistics, this time-saving phenomenon manifests itself as a correlation between the rank of a word (from most frequently used to least) and its usage in the text as expressed as a percentage or fraction of all of the words used. Mathematician and occasional linguist Yuri Manin offers a technical definition of this phenomenon, noting that Zipf’s law “states that if words of a language are ranked in the order of decreasing frequency in texts, the frequency is inversely proportional to the rank (sequence number in the list).”3 As Kenner more prosaically offers, if the largest city in the U.S. is New York, the “second largest has ½ the population of New York. Number three, ⅓ the population of New York” and so on and so forth until the last Montana holdout is counted.4 Regardless of definition, this regularity appears in a staggering number of texts.  Aside from the James and Eliot he surveys, Kenner notes that Zipf himself (as noted in my last post) discovered the regularity in a copy of the Word Index to James Joyce’s Ulysses.  While Kenner points out that Zipf approached such a data point in a workmanlike fashion — “he was not surprised to find the same pattern […]

Posted by Craig on January 23rd, 2015

-->