nlp:corpus_analysis
This is an old revision of the document!
Corpus Analysis
Often considered a linguistics topic, corpus analysis is the study of language use in a corpus, often analysing the distribution of various phenomena.
Frequency Distribution and Zipf's Law
Zipf's law describes the frequency distribution of words in language.
Historical Papers
- Miller & Newman 1958 - Tests of a Statistical Explanation of the Rank-Frequency Relation for Words in Written English A study on the UNIVAC computer
- Miller et al 1959 - Length-frequency statistics for written English, available here. A study of frequency statistics of words using the UNIVAC
- Bull 1952 - Problems of Vocabulary Frequency and Distribution An interesting read, from here.
nlp/corpus_analysis.1661645516.txt.gz · Last modified: 2023/06/15 07:36 (external edit)