====== History of NLP ====== ===== Historical Surveys ===== * [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.35.1156&rep=rep1&type=pdf|Zechner 1997 - A Literature Survey on Information Extraction and Text Summarization]] ===== Papers and Popular Descriptions ===== * Statistical NLP * [[https://www.aclweb.org/anthology/A88-1019.pdf|Church 1988 - A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text]] One of the first statistical POS taggers, one of the papers that started the statistical/machine learning revolution in NLP * {{papers:Statistical Techniques for NLP.pdf|Charniak 1998 - Statistical Techniques for NLP}} ===== Early Work (prior to 2000) ===== === Models of Language === * [[https://ieeexplore.ieee.org/document/1056813|Chomsky 1956 - Three Models for the Description of Language]] [[http://web.mit.edu/6.441/spring06/projects/1/aak@mit.edu.1.pdf|overview]] Introduced the [[https://en.wikipedia.org/wiki/Chomsky_hierarchy|Chomsky hierarchy]] * [[https://www.sciencedirect.com/science/article/pii/S0019995858900822|Chomsky & Miller 1958 - Finite State Languages]]. From [[https://www.sciencedirect.com/journal/information-and-control/vol/1/issue/2|here]]. * [[https://pdf.sciencedirectassets.com/273276/1-s2.0-S0019995800X0157X/1-s2.0-S0019995859903626/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEID%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQD09vkBwVhm5rnhMeGy0RVxzbNlTmJIyhGvtUsWQN8qiwIgOFZhbef1NmFzZz1TR9mlys22qFGannbGcL18GQB6gWwq2wQI%2Bf%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAFGgwwNTkwMDM1NDY4NjUiDAl5PazPWbisJcjwiSqvBGZHVVp54W5q8XVrXCQVIsLWCJVxkUT7NqxPE6%2BGSSZHiVq4UyKdznaia9qhCu5O%2B72jtH0cSbJdFepgRWCzIox9KDItVIRcwh4hsFsJL7rjZIUCIJuegBa2SDnJ%2BghJSbL8xzJ9OOL8j4%2FAynfILN9fxWD9%2BR52NVxxopwUMqM4bfYlqyD9fRoxVfCfIyX5ZUKOGALs%2BoHKpQAre8wACBuhYk%2F1gmHdgJhDbkMinJ0S1cZiL%2FWP6s%2BesOda3dcO9cTRu67yjRJpxprmM6HFKa2ZWkJ7DdQF83Mj0%2Femw5%2BJM4kiBHqSG3KaWg2sn5cUxwprJ3FtqQncF3ZrBf0rch3v1Q%2FGDR%2FsT532lFmzaNuZvTSZhy7im5I0X8JfVkZ4TWCWxQeV8ZkWoJNWpF%2F%2Fzu8%2FF60oUnM7Jn3nVXX1rI8V849Afai8z9cA7p02oO8OmoZH6wqcsB53X1wUR25k4rIfAmuamE%2Bt8Hv7g5IQtfLTLU%2BbWkTEu2eMCqCR1EOJHbRlVW%2BEZftq%2FKFNkSLAL%2F1E5iQ28BmgYXgRrjkAZEkzBd4sYVZ8nR2wtA6VqBixzPaQZmyWlg%2Bx1jLkqDi4gSJ7tJGhLnw9JVTxNgOu%2FghGJhzm9PqIVQF0zAAULYAD4tM8ikB6s5x%2BzNUDyJSCAyGFvb%2BpoKkBYAtVcbItu%2BUfdIilWfGJ6d8ZCcIWXh9DAmtXj8yNmw5mQano%2BRAZfZYm72Q%2FIJDi0nUcJkFFj%2Fcwm8%2BqmAY6qQExF1OQyq0lxVSNLCZaOkggiR%2Fmc30pTvPhNXTc4AifB8fAZPI4dhEoXUquP6NUCfkBON00QdS4CHOmgNDS8q023JC2V%2BZszhPoCdZA1NYG%2B%2BAlA2qtrLPzsqEwxEyRBL7QG1yxRosGF%2BYwc321hrMYEUz3TXiC7ZFc%2FZwpM02sHifk5eyYgfvjb3DBqyhi9lMRCwtS0gGteUQBcO9wVFNP4JQf4zu4lioj&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220828T003003Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYSEFI44MJ%2F20220828%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=3f7a8a84c78623558304902421089f522b047e87054fa5c78828dd9a5dbaa698&hash=e7d5f8909080b4caec879073ccc6d2c6840b4411c3b59b5a8d736bd16af9724b&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0019995859903626&tid=spdf-759690fa-e320-4e4d-9b21-928d9892cc6d&sid=072472816e3c7840759a50441ebf483750d5gxrqa&type=client&ua=4d525f5c05080e56590707&rr=7418efa59dd89e5f|Chomsky 1959 - On Certain Formal Properties of Grammars]] * [[http://www-igm.univ-mlv.fr/~berstel/Mps/Travaux/A/1963-7ChomskyAlgebraic.pdf|Chomsky & Schützenberger 1963 - The Algebraic Theory of Context Free Languages]] === Machine Translation === * Interlingua-based * [[https://aclanthology.org/1988.tmi-1.4.pdf|Nirenburg et al 1988 - Lexical Realization in Natural Language Generation]] Describes the generation system of DIOGENES. === Question Answering === * [[https://aclanthology.org/T75-1005.pdf|Lehnert 1975 - What Makes Sam Run? Script Based Techniques for Question Answering]] Very cool work. * [[https://aclanthology.org/W00-0603.pdf|Riloff & Thelen 2000 - A Rule-based Question Answering System for Reading Comprehension Tests]] === Dialog === * ELIZA: [[https://cse.buffalo.edu/~rapaport/572/S02/weizenbaum.eliza.1966.pdf|Weizenbaum 1966 - ELIZA - A Computer Program For the Study of Natural Language Communication Between Man And Machine]] [[https://en.wikipedia.org/wiki/ELIZA|wikipedia]] [[https://sites.google.com/view/elizagen-org/the-original-eliza|original source code]] [[https://github.com/jeffshrager/elizagen.org/blob/master/1965_Weizenbaum_MAD-SLIP/ELIZA_transcription_annotated_20220216.txt|commented source]] [[https://github.com/wadetb/eliza|Python reimplementation]] [[https://sites.google.com/view/elizagen-org/commonly-known-eliza-clones?authuser=0|other versions]] * PERRY: [[https://en.wikipedia.org/wiki/PARRY|wikipedia]] [[http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/classics/parry/|source code]] PERRY was the first program to pass the Turing test. * [[https://nlp.stanford.edu/acvogel/gus.pdf|Bobrow 1977 - GUS, A Frame-Driven Dialog System]] * [[https://aclanthology.org/P94-1009.pdf|Green & Carberry 1994 - A Hybrid Reasoning Model For Indirect Answers]] === Word-Sense Disambiguation === * [[https://www.ijcai.org/Proceedings/77-1/Papers/027A.pdf|Granger 1977 - FOUL-UP: A Program that Figures Out Meanings of Worcds from Context]] Really cool work. Uses "knowledge embodied in scripts to figure out likely definitions for unknown words." Related to recent (2020s) work in common-sense reasoning. === Speech Recognition === Included here since some of the algorithms are shared with statistical NLP methods * [[https://asa.scitation.org/doi/pdf/10.1121/1.1907936|Denes 1960 - Spoken Digit Recognition Using Time‐Frequency Pattern Matching ]] Word-based matching, cited by [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1171882|Bridle 1982]] * [[https://link.springer.com/content/pdf/10.1007/BF01074755.pdf|Vintsyuk 1968 - Speech Discrimination by Dynamic Programming]] [[https://drive.google.com/uc?export=view&id=1RWVDPJIqSdW-9S3u4DAUYHkoiNK0IKuP|pdf (UCSC only)]] [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1171882|Bridle 1982]] says this was pioneering work which was not well known in the West. * [[https://www.sciencedirect.com/science/article/pii/S0020737370800086|Velichko & Zagoruyko 1970 - Automatic Recognition of 200 Words]] [[https://drive.google.com/uc?export=view&id=1OgInzWSk1nmb7ESTDPYKKYugWjHtu_ee|pdf (UCSC only)]] Cited by [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1171882|Bridle 1982]] * [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1454428|Jelinek 1976 - Continuous Speech Recognition by Statistical Methods]] [[https://drive.google.com/uc?export=view&id=1szYEgJmf39fs1IrMBdMDNMG6Lf028JbO|pdf (UCSC only)]] * [[https://apps.dtic.mil/sti/citations/ADA035146|Lowerre 1976 - The Harpy Speech Recognition System]] (Ph.D. Thesis) {{papers:1976_-_harpy.pdf|pdf}} [[https://stacks.stanford.edu/file/druid:rq916rn6924/rq916rn6924.pdf|Summary]], missing one page. Cited Bridle 1982 (and Ney 1992) for the term "beam search" * [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1171882|Bridle et al 1982 - An Algorithm for Connected Word Recognition]] [[https://drive.google.com/uc?export=view&id=1_6CtQWkLFSAKMitHUALnw_jgEA_KocYx|pdf (UCSC only)]] Cited by Ney 1992 for beam search * [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=124938|Ney et al. 1992 - Data Driven Search Organization for Continuous Speech Recognition in the SPICOS System]] [[https://drive.google.com/uc?export=view&id=1wsoLDFcAIuVNVCPBVGXMaa0err2zgXKm|pdf (UCSC only)]] See p. 4 bottom for a history of beam search, which it says is called "beam search, DP beam search, or pruned DP search." === Generation === * [[https://aclanthology.org/1988.tmi-1.4.pdf|Nirenburg et al 1988 - Lexical Realization in Natural Language Generation]] Describes the generation part of the interlingua MT system DIOGENES. === Text Understanding === * [[https://files.eric.ed.gov/fulltext/ED161024.pdf|Shank 1975 - SAM - A Story Understander]] === Syntactic Parsing === === Semantic Parsing === * [[https://aclanthology.org/C69-0201.pdf|Schank & Tesler 1969 - A Conceptual Dependency Parser for Natural Language]] === Grammar Induction === === Systems that Learned === === Reasoning Systems === * [[https://onlinelibrary.wiley.com/doi/pdfdirect/10.1207/s15516709cog0201_3|Carbonell 1978 - POLITICS: Automated Ideological Reasoning]] * [[https://apps.dtic.mil/sti/pdfs/ADA064962.pdf|Carbonell 1979 - Subjective Understanding: Computer Models of Belief Systems]] PhD Thesis === Language Acquisition === * Smith 1980 - FOCUSER: A strategic interaction paradigm for language acquisition. AAIII 1980, cited by [[http://www.cs.cmu.edu/~tom/pubs/NeedForBias_1980.pdf|Mitchell 1980]]. Also published as a [[https://www.worldcat.org/title/focuser-a-strategic-interaction-paradigm-for-language-acquisition/oclc/11463422?referer=di&ht=edition|PhD thesis]]. ===== Early Machine Learning or Corpus-based Methods in NLP ===== * Overviews * [[https://www2.informatik.uni-hamburg.de/wtm/ps/book96.pdf|Wermter et al 1996 - Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing]] Nice overview, part of a larger book * Parsing * [[https://aclanthology.org/P79-1002.pdf|Carbonell 1979 - Towards a Self-Extending Parser]] * [[https://aclanthology.org/C86-1033.pdf|Sampson 1986 - A stochastic approach to parsing]] Learns statistical rules from a manually annotated corpus. Uses simulated annealing to find the most probable parse. (Randomized inference, similar to later work in NLP in 2014 [[https://aclanthology.org/D14-1109.pdf|here]]) "We have built up a database of manually-parsed sentences, from which we extract statistics that allow a likelihood measure to be determined for any logically possible non-leaf constituent of a parse-tree. That is, given a pairing of a mother-label with a sequence of daughter-labels, say the pair , the likelihood function will return a figure for the relative frequency with which (in this case) an adjective phrase consists of singular common noun + adjective + prepositional phrase." "The most direct way... would be to generate all possible tree-structures for a given sentence taken as a sequence of word-tags, and all possible labellings of each of those structures, and choose the tree whose overall plausibility figure is highest. Unlike in the case of word-tagging, however, for parsing this approach is wholly impractical... I have therefore begun to experiment with simulated annealing as a solution to the problem." * [[https://aclanthology.org/H90-1050.pdf|1990 - Session 9: Automatic Acquisition of Linguistic Structure]] From HLT (became NAACL) [[https://aclanthology.org/events/hlt-1990/|1990]], see also [[https://dblp.org/db/conf/naacl/naacl1990.html|dblp]] (and Mitch Marcus's google scholar) * Machine Translation * [[https://aclanthology.org/1988.tmi-1.19.pdf|Brown et al 1988 - A Statistical Approach to French/English Translation]] Cited by [[https://aclanthology.org/H90-1056.pdf|Gale & Church 1990]]. Reflection on the work [[https://aclanthology.org/2007.tmi-plenaries.2.pdf|here]] ===== Neural Networks in NLP ==== ==== Very Early Work ==== Work prior to 2000s. * Often called "Artificial Neural Networks (ANNs)" or "connectionist approach" in the old literature * Overviews * **[[https://ni.cmu.edu/~plaut/papers/pdf/RohdePlaut03CogStu.connModelsLang.pdf|Rohde & Plaut 2003 - Connectionist Models of Language Processing]]** Great overview of early work * [[https://link.springer.com/content/pdf/10.1007/BF00139194.pdf|Selman 1989 - Connectionist systems for natural language understanding]] * [[https://www2.informatik.uni-hamburg.de/wtm/ps/book96.pdf|Wermter et al 1996 - Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing]] Nice overview, part of a larger book * POS Tagging * Parsing * [[https://www.aaai.org/Papers/AAAI/1982/AAAI82-059.pdf|Small et al 1982 - Towards Connectionist Parsing]] * [[https://www.cs.toronto.edu/pub/gh/Selman-MSc-1985.pdf|Selman 1985 - Rule-based Processing in a Connectionist System for Natural Language Understanding]] (Tech Report CSRI-168 U. Toronto. {{papers:selman-msc-1985.pdf|local copy}}) Wow, foundational work. Way ahead of it's time. Used a heuristic method to set the weights, since this was before backprop was invented. Discusses learning the weights p. 37, bottom (p. 44 in pdf). Hinton was on the thesis committee. * Selman, B. & Hirst, G. (1985). A Rule-Based Connectionist Parsing System, Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine, CA, August 1985, 212-219. An extended version entitled 'Parsing as an Energy Minimization Problem' appeared in Genetic Algorithms and Simulated Annealing (ed.) Lawrence Davis, Pitman, London. 155-168. * [[https://www.google.com/books/edition/Program_of_the_Ninth_Annual_Conference_o/KnVBTk-hS10C?hl=en&gbpv=1&pg=PA70&printsec=frontcover|Charniak & Santos 1987 - A connectionist context-free parser which is not context-free but then it is not really connectionist either]] * [[https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1024.5449&rep=rep1&type=pdf|Jain 1991 - Parsing Complex Sentences with Structured Connectionist Networks]] {{papers:jain-1991-parsing.pdf|local copy}} * Semantic Parsing (including shallow semantic parsing) * [[https://www.taylorfrancis.com/chapters/edit/10.4324/9781315807997-12/implementing-semantic-networks-parallel-hardware|Hinton 1981 - Implementing semantic networks in parallel hardware]] Cited by [[https://stanford.edu/~jlmcc/papers/PDP/Chapter19.pdf|McClelland 1986]] as Hinton 1981. * [[https://stanford.edu/~jlmcc/papers/PDP/Chapter19.pdf|McClelland & Kawamoto 1986 - Mechanisms of sentence processing: Assigning roles to constituents]] ([[https://apps.dtic.mil/sti/pdfs/ADA168571.pdf|another copy with references]]) * Machine Translation * [[https://aclanthology.org/www.mt-archive.info/90/TMI-1992-McLean.pdf|McLean 1992 - Example-Based Machine Translation using Connectionist Matching]] * [[https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.466.8488&rep=rep1&type=pdf|Castaño et al 1997 - A Connectionist Approach to Machine Translation]] {{papers:castano_1997_connectionist.pdf|local copy}} * [[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.134.1431&rep=rep1&type=pdf|Castaño et al 1997 - Machine Translation using Neural Networks and Finite-State Models]] {{papers:castano_1997_mt_nn_fsm.pdf|local copy}} Good references to early literature * [[https://www.dlsi.ua.es/~mlf/docum/forcada97p.pdf|Forcada & Neco 1997 - Recursive Hetero-Associative Memories for Translation]] {{papers:forcada97p.pdf|local copy}} Introduced the encoder-decoder RNN architeture for NMT * Inference and Reasoning * [[http://www.cs.toronto.edu/~hinton/absps/symbolsIJCAI.pdf|Touretzky & Hinton 1985 - Symbols Among the Neurons: Details of a Connectionist Inference Architecture]] ==== Early Deep Learning in NLP ==== Work since 2000s, but prior to 2014. * [[https://proceedings.neurips.cc/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf|Bengio et al 2003 - A Neural Probabilistic Language Model]] * [[https://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf|Collobert et al 2011 - Natural Language Processing (Almost) from Scratch]] ===== Related Pages ===== * [[ml:History of ML]] * [[Key Papers in NLP]]