nlp:discourse_analysis
This is an old revision of the document!
Table of Contents
Discourse Analysis
Introductions and Overviews
- Introductions
- PDTB
Discourse Parsing
- Marcu & Echihabi 2002 - An Unsupervised Approach to Recognizing Discourse Relations The first discourse parser
- Perret et al 2016 - Integer Linear Programming for Discourse Parsing Good intro to discourse parsing
- Stab & Gurevych 2017 - Parsing Argumentation Structures in Persuasive Essays (Contains an overview of related work in discourse parsing in sec 2.5). Another version here: here.
For Dialog Systems
- Moore & Paris 1993 - Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information Old paper, but very good. Talks about how one could use RST in a dialog system planner. Contains a mapping between desired belief states in the listener and RST relations (in table 1, p. 20).
-
- https://github.com/derekmma/dialogue-discourse-relation (annotated corpus)
- https://github.com/jfainberg/self_dialogue_corpus (full corpus) (paper)
RST
- Linguistics theory, see also Wikipedia - Rhetorical Structure Theory
- Mann 1984 - Discourse Structures for Text Generation Talks about RST, says “The descriptive portion of RST has been developed over the past two years by Sandra Thompson and me, with major contributions by Christian Matthiassen and Barbara Fox” (footnote 1).
- Mann & Thompson 1988 - Rhetorical Structure Theory: Toward a functional theory of text organization The paper that is usually cited for RST
RST-DT
- Dataset
- Paper: Carlson et al 2001 - Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory Created the RST Discourse Tree Bank. Annotated full RST discourse structures.
- Guideline: RST Annotation Guideline and another guideline
- Annotation tool: rstWeb Web-based annotation tool
- Systems
- Perret et al 2016 - Integer Linear Programming for Discourse Parsing Good intro to discourse parsing
Other RST Datasets
Penn Discourse Treebank (PDTB)
See also PDTB Publications. PDTB is a shallow discourse representation, as opposed to RST-DT. According to Perret 2016 “the PDTB does not provide full discourse structures for texts,” but the RST-DT does.
- Dataset
- Systems
- Lin et al 2012 - A PDTB-Styled End-to-End Discourse Parser (parser) The parser that Stab & Gurevych 2017 used.
- Extensions

PDTB sense hierarchy from Prasad 2014. There are three levels - most systems do level 2 senses but not the finer-grained level 3 senses.
Other Datasets
-
- DialogBank website
- Bunt et al 2019 - The DialogBank: Dialogues with Interoperable Annotations Longer paper, great
Unsupervised Discourse Parsing
- Marcu & Echihabi 2002 - An Unsupervised Approach to Recognizing Discourse Relations The first discourse parser
- Nishida & Nakayama - Unsupervised discourse constituency parsing using Viterbi EM Rivals supervised methods. Doesn't seem to cite Marcu & Echihabi 2002!
Discourse Coherence
Document-Level Coherence
- Mesgar & Strube 2016 - Lexical Coherence Graph Modeling Using Word Embeddings Uses a readability ranking task.
- Lai & Tetreault 2018 - Discourse Coherence in the Wild: A Dataset, Evaluation and Methods Introduces the Grammarly Corpus of Discourse Coherence of 1,000 documents with human evaluation of coherence (at the document-level).
Coherence in Dialog
- Gandhe & Traum 2008 - An Evaluation Understudy for Dialogue Coherence Models Evaluates measures of dialog coherence against human judgements.
- Cervone & Riccardi 2020 - Is this Dialogue Coherent? Learning from Dialogue Acts and Entities. Introduces the Switchboard Coherence (SWBD-Coh) corpus (dataset), a dataset of human-human spoken dialogues from the Switchboard corpus annotated with human coherence ratings for each turn.
Linguistic Topics in Discourse Analysis
- Textual cohesion (linguistics topic). There are metrics for this (ask Pranav about this)
- Centering theory. See this paper.
Applications
- Summarization
Related Pages
nlp/discourse_analysis.1662685505.txt.gz · Last modified: 2023/06/15 07:36 (external edit)