nlp:history_of_nlp
This is an old revision of the document!
Table of Contents
History of NLP
Historical Surveys
Papers and Popular Descriptions
- Statistical NLP
- Church 1988 - A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text One of the first statistical POS taggers, one of the papers that started the statistical/machine learning revolution in NLP
Early Work (prior to 2000)
Machine Translation
- Interlingua-based
- Nirenburg et al 1988 - Lexical Realization in Natural Language Generation Describes the generation system of DIOGENES.
Question Answering
Dialog
- PERRY: wikipedia source code
Word-Sense Disambiguation
- Granger 1977 - FOUL-UP: A Program that Figures Out Meanings of Worcds from Context Really cool work. Uses “knowledge embodied in scripts to figure out likely definitions for unknown words.” Related to recent (2020s) work in common-sense reasoning.
Speech Recognition
Included here since some of the algorithms are shared with statistical NLP methods
- Lowerre 1976 - The Harpy Speech Recognition System (Ph.D. Thesis) Missing one page Cited by Ney 1992 for beam search
- Bridle et al 1982 - An Algorithm for Connected Word Recognition pdf (UCSC only) Cited by Ney 1992 for beam search
- Ney et al. 1992 - Data Driven Search Organization for Continuous Speech Recognition in the SPICOS System pdf (UCSC only) See p. 4 bottom for a history of beam search, which it says is called “beam search, DP beam search, or pruned DP search.”
Generation
- Nirenburg et al 1988 - Lexical Realization in Natural Language Generation Describes the generation part of the interlingua MT system DIOGENES.
Text Understanding
Syntactic Parsing
Semantic Parsing
Grammar Induction
Systems that Learned
Reasoning Systems
Language Acquisition
- Smith 1980 - FOCUSER: A strategic interaction paradigm for language acquisition. AAIII 1980, cited by Mitchell 1980. Also published as a PhD thesis.
Early Machine Learning or Corpus-based Methods in NLP
- Overviews
- Wermter et al 1996 - Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing Nice overview, part of a larger book
- Parsing
- Sampson 1986 - A stochastic approach to parsing Learns statistical rules from a manually annotated corpus. Uses simulated annealing to find the most probable parse. (Randomized inference, similar to later work in NLP in 2014 here) “We have built up a database of manually-parsed sentences, from which we extract statistics that allow a likelihood measure to be determined for any logically possible non-leaf constituent of a parse-tree. That is, given a pairing of a mother-label with a sequence of daughter-labels, say the pair <J, NN JJ P>, the likelihood function will return a figure for the relative frequency with which (in this case) an adjective phrase consists of singular common noun + adjective + prepositional phrase.” “The most direct way… would be to generate all possible tree-structures for a given sentence taken as a sequence of word-tags, and all possible labellings of each of those structures, and choose the tree whose overall plausibility figure is highest. Unlike in the case of word-tagging, however, for parsing this approach is wholly impractical… I have therefore begun to experiment with simulated annealing as a solution to the problem.”
- 1990 - Session 9: Automatic Acquisition of Linguistic Structure From HLT (became NAACL) 1990, see also dblp (and Mitch Marcus's google scholar)
- Machine Translation
- Brown et al 1988 - A Statistical Approach to French/English Translation Cited by Gale & Church 1990. Reflection on the work here
Neural Networks in NLP
Very Early Work
Work prior to 2000s.
- Often called “Artificial Neural Networks (ANNs)” or “connectionist approach” in the old literature
- Overviews
- Rohde & Plaut 2003 - Connectionist Models of Language Processing Great overview of early work
- Wermter et al 1996 - Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing Nice overview, part of a larger book
- POS Tagging
- Parsing
- Selman 1985 - Rule-based Processing in a Connectionist System for Natural Language Understanding (Tech Report CSRI-168 U. Toronto. local copy) Wow, foundational work. Way ahead of it's time. Used a heuristic method to set the weights, since this was before backprop was invented. Discusses learning the weights p. 37, bottom (p. 44 in pdf). Hinton was on the thesis committee.
- Selman, B. & Hirst, G. (1985). A Rule-Based Connectionist Parsing System, Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine, CA, August 1985, 212-219. An extended version entitled 'Parsing as an Energy Minimization Problem' appeared in Genetic Algorithms and Simulated Annealing (ed.) Lawrence Davis, Pitman, London. 155-168.
- Semantic Parsing (including shallow semantic parsing)
- Hinton 1981 - Implementing semantic networks in parallel hardware Cited by McClelland 1986 as Hinton 1981.
- Machine Translation
- Castaño et al 1997 - Machine Translation using Neural Networks and Finite-State Models local copy Good references to early literature
- Forcada & Neco 1997 - Recursive Hetero-Associative Memories for Translation local copy Introduced the encoder-decoder RNN architeture for NMT
- Inference and Reasoning
Early Deep Learning in NLP
Work since 2000s, but prior to 2014.
Related Pages
nlp/history_of_nlp.1658271473.txt.gz · Last modified: 2023/06/15 07:36 (external edit)