Table of Contents
Scientific Text Processing
Overviews
Papers
Datasets
Tools
Resources
People
Related Pages
Scientific Text Processing
Overviews
Nasar et al 2018 - Information Extraction From Scientific Articles: A Survey
Papers
2013 - Purpose and Polarity of Citation: Towards NLP-based Bibliometrics
2018 - Goal-Oriented Representation of Scientific Papers
Predicting Co-authors
Graph Structure
Vedak 2022 - ArXiv Citation Graph
Citation Processing
2012 - Reference Scope Identification in Citing Sentences
Knowledge Base Construction
Hope et al 2021 - Extracting a Knowledge Base of Mechanisms from COVID-19 Papers
Summarization
Cachola et al 2021 - TLDR: Extreme Summarization of Scientific Documents
Generating Papers
Wang et al 2019 - PaperRobot: Incremental Draft Generation of Scientific Ideas
Chen et al 2021 - SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation
Generating Hypotheses
Yang et al 2023 - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery
Datasets
CORD-19: Covid-19 Dataset
paper
Bio AMR Corpus
Tools
Yamaguchi & Morishita 2023 - appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit
See methods section here:
Vedak 2022 - ArXiv Citation Graph
Resources
SciNLP Workshop
Scholarly Document Processing Workshop
2022
People
Dragomir Radev
Daniel Weld
Related Pages
Patent Domain NLP
Some overlap with scientific domain nlp, especially for information extraction for patents