Research Project Abstract

Zipf's law states that given some text, the frequency of any word is inversely proportional to its statistical rank. For English if the most common word happens to be “a”, it will appear twice as often as the second most common word and three times more than the third most common word.

Zipf's law has been shown to hold for two and three word sequences in all languages examined. In recent years, linguists have begun to notice that speech is structurally different from writing. Nevertheless, no one, to our knowledge has demonstrated that Zipf's Law describes the distribution of words in speech. To do this, we are building custom software in the Python programming language to investigate the distribution of words in the Buckeye Corpus, a collection of 300,000 plus words of American English speech.

Session Number

RS1

Location

Robinson 141

Abstract Number

RS1-a

COinS
 
Apr 23rd, 9:00 AM Apr 23rd, 10:30 AM

Zipfian Distribution of Words and Word Phrases in American English Speech

Robinson 141

Zipf's law states that given some text, the frequency of any word is inversely proportional to its statistical rank. For English if the most common word happens to be “a”, it will appear twice as often as the second most common word and three times more than the third most common word.

Zipf's law has been shown to hold for two and three word sequences in all languages examined. In recent years, linguists have begun to notice that speech is structurally different from writing. Nevertheless, no one, to our knowledge has demonstrated that Zipf's Law describes the distribution of words in speech. To do this, we are building custom software in the Python programming language to investigate the distribution of words in the Buckeye Corpus, a collection of 300,000 plus words of American English speech.