Faculty Sponsor
Paul DePalma, Gonzaga University
Research Project Abstract
Zipf's law states that given some text, the frequency of any word is inversely proportional to its statistical rank. For English if the most common word happens to be “a”, it will appear twice as often as the second most common word and three times more than the third most common word.
Zipf's law has been shown to hold for two and three word sequences in all languages examined. In recent years, linguists have begun to notice that speech is structurally different from writing. Nevertheless, no one, to our knowledge has demonstrated that Zipf's Law describes the distribution of words in speech. To do this, we are building custom software in the Python programming language to investigate the distribution of words in the Buckeye Corpus, a collection of 300,000 plus words of American English speech.
Session Number
RS1
Location
Robinson 141
Abstract Number
RS1-a
Zipfian Distribution of Words and Word Phrases in American English Speech
Robinson 141
Zipf's law states that given some text, the frequency of any word is inversely proportional to its statistical rank. For English if the most common word happens to be “a”, it will appear twice as often as the second most common word and three times more than the third most common word.
Zipf's law has been shown to hold for two and three word sequences in all languages examined. In recent years, linguists have begun to notice that speech is structurally different from writing. Nevertheless, no one, to our knowledge has demonstrated that Zipf's Law describes the distribution of words in speech. To do this, we are building custom software in the Python programming language to investigate the distribution of words in the Buckeye Corpus, a collection of 300,000 plus words of American English speech.