====== Scientific Text Processing ====== ===== Overviews ===== * [[http://faculty.pucit.edu.pk/swjaffry/rpr/Scientometrics18InformationMining.pdf|Nasar et al 2018 - Information Extraction From Scientific Articles: A Survey]] ===== Papers ===== * [[https://www.aclweb.org/anthology/N13-1067.pdf|2013 - Purpose and Polarity of Citation: Towards NLP-based Bibliometrics]] * [[http://lrec-conf.org/workshops/lrec2018/W24/pdf/5_W24.pdf|2018 - Goal-Oriented Representation of Scientific Papers]] * **Predicting Co-authors** * **Graph Structure** * [[https://benty-fields.com/static/include/files/Arxiv_Citation_Graph.pdf|Vedak 2022 - ArXiv Citation Graph]] * **Citation Processing** * [[https://www.aclweb.org/anthology/N12-1009.pdf|2012 - Reference Scope Identification in Citing Sentences]] * **Knowledge Base Construction** * [[https://arxiv.org/pdf/2010.03824.pdf|Hope et al 2021 - Extracting a Knowledge Base of Mechanisms from COVID-19 Papers]] * **Summarization** * [[https://arxiv.org/pdf/2004.15011.pdf|Cachola et al 2021 - TLDR: Extreme Summarization of Scientific Documents]] * **Generating Papers** * [[https://arxiv.org/pdf/1905.07870.pdf|Wang et al 2019 - PaperRobot: Incremental Draft Generation of Scientific Ideas]] * [[https://arxiv.org/pdf/2110.10774.pdf|Chen et al 2021 - SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation]] * **Generating Hypotheses** * [[https://arxiv.org/pdf/2309.02726.pdf|Yang et al 2023 - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery]] ===== Datasets ===== * [[https://allenai.org/data/cord-19|CORD-19: Covid-19 Dataset]] [[https://arxiv.org/pdf/2004.10706.pdf|paper]] * [[https://amr.isi.edu/download.html|Bio AMR Corpus]] ===== Tools ===== * [[https://arxiv.org/pdf/2310.01206.pdf|Yamaguchi & Morishita 2023 - appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit]] * See methods section here: [[https://benty-fields.com/static/include/files/Arxiv_Citation_Graph.pdf|Vedak 2022 - ArXiv Citation Graph]] ===== Resources ===== * [[https://scinlp.org/|SciNLP Workshop]] * [[https://sdproc.org/|Scholarly Document Processing Workshop]] [[https://sdproc.org/2022/|2022]] ===== People ===== * [[https://scholar.google.com/citations?user=vIqWvgwAAAAJ&hl=en|Dragomir Radev]] * [[https://scholar.google.com/citations?user=IUZgJogAAAAJ&hl=en|Daniel Weld]] ===== Related Pages ===== * [[Patent Domain NLP]] Some overlap with scientific domain nlp, especially for information extraction for patents