nlp:discourse_analysis
This is an old revision of the document!
Table of Contents
Discourse Analysis
Introductions and Overviews
- Introductions
- PDTB
Discourse Parsing
- Marcu & Echihabi 2002 - An Unsupervised Approach to Recognizing Discourse Relations The first discourse parser
- Perret et al 2016 - Integer Linear Programming for Discourse Parsing Good intro to discourse parsing
- Stab & Gurevych 2017 - Parsing Argumentation Structures in Persuasive Essays (Contains an overview of related work in discourse parsing in sec 2.5). Another version here: here.
For Dialog Systems
- Moore & Paris 1993 - Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information Old paper, but very good. Talks about how one could use RST in a dialog system planner. Contains a mapping between desired belief states in the listener and RST relations (in table 1, p. 20).
-
- https://github.com/derekmma/dialogue-discourse-relation (annotated corpus)
- https://github.com/jfainberg/self_dialogue_corpus (full corpus) (paper)
RST
- Linguistics theory, see also Wikipedia - Rhetorical Structure Theory
- Mann & Thompson 1988 - Rhetorical Structure Theory: Toward a functional theory of text organization The paper that is usually cited for introducing RST
RST-DT
- Dataset
- Paper: Carlson et al 2001 - Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory Created the RST Discourse Tree Bank. Annotated full RST discourse structures.
- Guideline: RST Annotation Guideline and another guideline
- Annotation tool: rstWeb Web-based annotation tool
- Systems
- Perret et al 2016 - Integer Linear Programming for Discourse Parsing Good intro to discourse parsing
Other RST Datasets
Penn Discourse Treebank (PDTB)
See also PDTB Publications. PDTB is a shallow discourse representation, as opposed to RST-DT. According to Perret 2016 “the PDTB does not provide full discourse structures for texts,” but the RST-DT does.
- Dataset
- Systems
- Lin et al 2012 - A PDTB-Styled End-to-End Discourse Parser (parser) The parser that Stab & Gurevych 2017 used.
- Extensions

PDTB sense hierarchy from Prasad 2014. There are three levels - most systems do level 2 senses but not the finer-grained level 3 senses.
Miscellaneous Datasets
-
- DialogBank: Bunt et al 2016 - The DialogBank
Unsupervised Discourse Parsing
- Marcu & Echihabi 2002 - An Unsupervised Approach to Recognizing Discourse Relations The first discourse parser
- Nishida & Nakayama - Unsupervised discourse constituency parsing using Viterbi EM Rivals supervised methods. Doesn't seem to cite Marcu & Echihabi 2002!
Discourse Coherence
Document-Level Coherence
- Mesgar & Strube 2016 - Lexical Coherence Graph Modeling Using Word Embeddings Uses a readability ranking task.
- Lai & Tetreault 2018 - Discourse Coherence in the Wild: A Dataset, Evaluation and Methods Introduces the Grammarly Corpus of Discourse Coherence of 1,000 documents with human evaluation of coherence (at the document-level).
Coherence in Dialog
- Gandhe & Traum 2008 - An Evaluation Understudy for Dialogue Coherence Models Evaluates measures of dialog coherence against human judgements.
- Cervone & Riccardi 2020 - Is this Dialogue Coherent? Learning from Dialogue Acts and Entities. Introduces the Switchboard Coherence (SWBD-Coh) corpus (dataset), a dataset of human-human spoken dialogues from the Switchboard corpus annotated with human coherence ratings for each turn.
Linguistic Topics in Discourse Analysis
- Textual cohesion (linguistics topic). There are metrics for this (ask Pranav about this)
- Centering theory. See this paper.
Applications
- Summarization
Related Pages
nlp/discourse_analysis.1657143069.txt.gz · Last modified: 2023/06/15 07:36 (external edit)