User Tools

Site Tools


nlp:corpus_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:corpus_analysis [2023/05/16 08:04] – [Frequency Distribution and Zipf's Law] jmflanignlp:corpus_analysis [2023/06/15 07:36] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== Corpus Analysis ====== ====== Corpus Analysis ======
-Often considered a linguistics topic, //**corpus analysis**// is the study of language use in a corpus, often analyzing the distribution of various phenomena (phonological, lexical, syntactic, etc). Sometimes the analysis is performed comparing across time, languages, or different genres.+Often considered a linguistics topic, //**corpus analysis**// is the study of language in a corpus, often analyzing the distribution of various phenomena (phonological, lexical, syntactic, etc). Sometimes the analysis is performed comparing across time, languages, or different genres.
  
 ===== Frequency Distribution and Zipf's Law ===== ===== Frequency Distribution and Zipf's Law =====
Line 7: Line 7:
 === Historical Papers === === Historical Papers ===
   * [[https://psycnet.apa.org/record/1935-04756-000|Zipf 1935 - The Psycho-Biology of Language (book)]]   * [[https://psycnet.apa.org/record/1935-04756-000|Zipf 1935 - The Psycho-Biology of Language (book)]]
 +  * [[https://aclanthology.org/1952.earlymt-1.17.pdf|Bull 1952 - Problems of Vocabulary Frequency and Distribution]] An interesting read, from [[https://aclanthology.org/events/earlymt-1952/|here]].
   * [[https://www.jstor.org/stable/1419208#metadata_info_tab_contents|Miller & Newman 1958 - Tests of a Statistical Explanation of the Rank-Frequency Relation for Words in Written English]] A study on the UNIVAC computer   * [[https://www.jstor.org/stable/1419208#metadata_info_tab_contents|Miller & Newman 1958 - Tests of a Statistical Explanation of the Rank-Frequency Relation for Words in Written English]] A study on the UNIVAC computer
   * [[https://www.sciencedirect.com/science/article/pii/S0019995858902298|Miller et al 1959 - Length-frequency statistics for written English]], available [[https://www.sciencedirect.com/journal/information-and-control/vol/1/issue/4|here]]. A study of frequency statistics of words using the UNIVAC. Talks about types and tokens.  Introduces the terms "function words" and "content words" on p. 377 (p. 8 in the pdf).   * [[https://www.sciencedirect.com/science/article/pii/S0019995858902298|Miller et al 1959 - Length-frequency statistics for written English]], available [[https://www.sciencedirect.com/journal/information-and-control/vol/1/issue/4|here]]. A study of frequency statistics of words using the UNIVAC. Talks about types and tokens.  Introduces the terms "function words" and "content words" on p. 377 (p. 8 in the pdf).
-  * [[https://aclanthology.org/1952.earlymt-1.17.pdf|Bull 1952 - Problems of Vocabulary Frequency and Distribution]] An interesting readfrom [[https://aclanthology.org/events/earlymt-1952/|here]].+ 
 +===== Books ===== 
 +  * [[https://books.google.com/books?id=fzkQPKoFEb0C&pg=PA1|Word Frequency Distributions]], Harald (2002) 
 + 
 +===== People ===== 
 +  * [[https://en.wikipedia.org/wiki/George_Armitage_Miller|George Miller]] 
 +  * [[https://en.wikipedia.org/wiki/George_Kingsley_Zipf|George Zipf]]
  
nlp/corpus_analysis.1684224242.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki