====== Summarization ====== ===== Overviews ===== Best overviews (as of 2021): {{papers:Klymenko_2020_Automatic_Text_Summarization.pdf|Klymenko & Braun 2020 - Automatic Text Summarization: A State-of-the-Art Review}} and {{papers:el-kassas_2021_automatic_text_summarization.pdf|El-Kassas et al 2021 - Automatic Text Summarization: A Comprehensive Survey}} * **General** * [[https://arxiv.org/pdf/1707.02268.pdf|Allahyari et al 2017 - Text Summarization Techniques: A Brief Survey]] * [[http://jad.shahroodut.ac.ir/article_1189_28715967fcd8b7bfb463ab90aca5a9f7.pdf|Nazari & Mahdavi 2019 - A survey on Automatic Text Summarization]] Kind of a weird survey * **{{papers:Klymenko_2020_Automatic_Text_Summarization.pdf|Klymenko & Braun 2020 - Automatic Text Summarization: A State-of-the-Art Review}}** Good overview of the work in recent years. * {{papers:pramita_2020_review_of_automatic.pdf|Widyassari et al 2020 - Review of Automatic Text Summarization Techniques & Methods}} Not the best review: I can't believe they didn't include Rush et al 2015. * **{{papers:el-kassas_2021_automatic_text_summarization.pdf|El-Kassas et al 2021 - Automatic Text Summarization: A Comprehensive Survey}}** * **Abstractive** * [[https://www.worldscientific.com/doi/abs/10.1142/9789813274884_0006|Baumel & Elhadad 2019 - A Survey of Neural Models for Abstractive Summarization]] * **{{papers:lin_2019_abstractive_summarization.pdf|Lin & Ng 2019 - Abstractive Summarization: A Survey of the State of the Art}}** * [[https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9328413|Syed et al 2021 - A Survey of the State-of-the-Art Models in Neural Abstractive Text Summarization]] * **Multi-document** * [[https://arxiv.org/pdf/2011.04843.pdf|Ma et al 2020 - Multi-document Summarization via Deep Learning Techniques: A Survey]] * **Datasets** * [[https://arxiv.org/pdf/2411.04585|Dahan & Stanovsky 2024 - The State and Fate of Summarization Datasets: A Survey]] * **Older surveys** * [[https://www.cs.cmu.edu/~afm/Home_files/Das_Martins_survey_summarization.pdf|Das & Martins 2009 - A Survey on Automatic Text Summarization]] ===== Papers ===== * [[https://arxiv.org/pdf/1509.00685.pdf|Rush et al 2015 - A Neural Attention Model for Abstractive Sentence Summarization]] * [[https://arxiv.org/pdf/1602.06023.pdf|Nallapati et al 2016 - Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond]] * [[https://arxiv.org/pdf/1704.04368.pdf|See et al 2017 - Get To The Point: Summarization with Pointer-Generator Networks]] * [[https://arxiv.org/pdf/1904.02321.pdf|Arumae & Liu 2018 - Guiding Extractive Summarization with Question-Answering Rewards]] * [[https://www.aclweb.org/anthology/2021.eacl-main.213.pdf|Padmakumar & He 2021 - Unsupervised Extractive Summarization using Pointwise Mutual Information]] ===== Multi-Document ===== * [[https://www.aclweb.org/anthology/D18-1446.pdf|Lebanoff et al 2018 - Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization]] ===== Datasets ===== * **Surveys** * [[https://arxiv.org/pdf/2411.04585|Dahan & Stanovsky 2024 - The State and Fate of Summarization Datasets: A Survey]] * CNN / Daily Mail summarization dataset * Paper: [[https://arxiv.org/pdf/1602.06023.pdf|Nallapati et al 2016 - Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond]] * Processed Dataset: [[https://github.com/harvardnlp/sent-summary|here]] or [[https://github.com/abisee/cnn-dailymail|here]] or [[https://github.com/JafferWilson/Process-Data-of-CNN-DailyMail|here]], Original Dataset: [[https://github.com/alesee/abstractive-text-summarization|github]] * [[https://arxiv.org/pdf/1804.11283.pdf|Grusky et al 2018 - NEWSROOM: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies]] * [[https://arxiv.org/pdf/2010.03093.pdf|Ladhak et al 2020 - WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization]] * Medical Domain * [[https://www.aclweb.org/anthology/2021.naacl-main.395.pdf|Devaraj et al 2021 - Paragraph-level Simplification of Medical Texts]] (Actually a [[text simplification]] task) Dataset size: 3,500 training, 400 dev, 400 test ===== Evaluation ===== * [[https://arxiv.org/pdf/1906.00318.pdf|Eyal et al 2019 - Question Answering as an Automatic Evaluation Metric for News Article Summarization]] ===== People ===== * [[https://scholar.google.com/citations?user=22ohn6AAAAAJ&hl=en|Fei Liu]] * [[https://scholar.google.com/citations?user=LIjnUGgAAAAJ&hl=en|Alexander Rush]] * [[https://scholar.google.com/citations?user=Is0pLz0AAAAJ&hl=en|Michael Elhadad]] * [[https://scholar.google.com/citations?user=uczqEdUAAAAJ&hl=en|Lu Wang]] ===== Related Pages ====== * [[Keyphrase Generation]] * [[Text Simplification]]