nlp:corpus_analysis

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:corpus_analysis [2023/05/16 08:08] – [People] jmflanignlp:corpus_analysis [2023/06/15 07:36] (current) – external edit 127.0.0.1
Line 1: Line 1:
 ====== Corpus Analysis ====== ====== Corpus Analysis ======
-Often considered a linguistics topic, //**corpus analysis**// is the study of language use in a corpus, often analyzing the distribution of various phenomena (phonological, lexical, syntactic, etc). Sometimes the analysis is performed comparing across time, languages, or different genres.+Often considered a linguistics topic, //**corpus analysis**// is the study of language in a corpus, often analyzing the distribution of various phenomena (phonological, lexical, syntactic, etc). Sometimes the analysis is performed comparing across time, languages, or different genres.
  
 ===== Frequency Distribution and Zipf's Law ===== ===== Frequency Distribution and Zipf's Law =====
Line 11: Line 11:
   * [[https://www.sciencedirect.com/science/article/pii/S0019995858902298|Miller et al 1959 - Length-frequency statistics for written English]], available [[https://www.sciencedirect.com/journal/information-and-control/vol/1/issue/4|here]]. A study of frequency statistics of words using the UNIVAC. Talks about types and tokens.  Introduces the terms "function words" and "content words" on p. 377 (p. 8 in the pdf).   * [[https://www.sciencedirect.com/science/article/pii/S0019995858902298|Miller et al 1959 - Length-frequency statistics for written English]], available [[https://www.sciencedirect.com/journal/information-and-control/vol/1/issue/4|here]]. A study of frequency statistics of words using the UNIVAC. Talks about types and tokens.  Introduces the terms "function words" and "content words" on p. 377 (p. 8 in the pdf).
  
-==== People =====+===== Books ===== 
 +  * [[https://books.google.com/books?id=fzkQPKoFEb0C&pg=PA1|Word Frequency Distributions]], Harald (2002) 
 + 
 +===== People =====
   * [[https://en.wikipedia.org/wiki/George_Armitage_Miller|George Miller]]   * [[https://en.wikipedia.org/wiki/George_Armitage_Miller|George Miller]]
   * [[https://en.wikipedia.org/wiki/George_Kingsley_Zipf|George Zipf]]   * [[https://en.wikipedia.org/wiki/George_Kingsley_Zipf|George Zipf]]
  
nlp/corpus_analysis.1684224517.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki