nlp:topic_modeling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:topic_modeling [2021/11/12 06:19] – [Papers] jmflanignlp:topic_modeling [2023/09/06 09:33] (current) – [Overviews] jmflanig
Line 1: Line 1:
 ====== Topic Modeling ====== ====== Topic Modeling ======
 +Topic modeling is used to analyze the distribution of words in documents.  It assigns sets of words to "topics," where each document contains one or more topics.  Usually the topic assignments for words and documents is done using unsupervised methods, and doesn't correspond to a particular definition of topics.  A popular method for topic modeling is Latent Dirichlet Allocation (LDA, [[https://jmlr.org/papers/volume3/blei03a/blei03a.pdf|Blei 2003]]).
 ===== Overviews ===== ===== Overviews =====
   * [[https://en.wikipedia.org/wiki/Topic_model|Wikipedia - Topic Model]]   * [[https://en.wikipedia.org/wiki/Topic_model|Wikipedia - Topic Model]]
   * [[https://oar.princeton.edu/bitstream/88435/pr1bv3w/1/OA_IntroductionProbabilisticTopicModels.pdf|Blei 2012 - Introduction to Probabilistic Topic Models]]   * [[https://oar.princeton.edu/bitstream/88435/pr1bv3w/1/OA_IntroductionProbabilisticTopicModels.pdf|Blei 2012 - Introduction to Probabilistic Topic Models]]
 +  * Surveys
 +    * [[https://dl.acm.org/doi/pdf/10.1145/3507900?casa_token=dix6D7eQu6QAAAAA:ibflHjUrVvZVK50u_z43d9bIuxt4W8_i9qEF55lpvYe1Up5_Dp_VWf7f-iV4Uz81ac9uf6Y-fZKMfw|Churchill & Singh 2022 - The Evolution of Topic Modeling]]
  
 ===== Papers ===== ===== Papers =====
Line 10: Line 12:
   * Recent papers   * Recent papers
     * [[https://aclanthology.org/E17-2069.pdf|Schofield et al 2017 - Pulling Out the Stops: Rethinking Stopword Removal for Topic Models]]     * [[https://aclanthology.org/E17-2069.pdf|Schofield et al 2017 - Pulling Out the Stops: Rethinking Stopword Removal for Topic Models]]
 +    * [[https://arxiv.org/pdf/1809.02687.pdf|Ding et al 2018 - Coherence-Aware Neural Topic Modeling]]
 +    * [[https://arxiv.org/pdf/2006.03354.pdf|Song et al 2020 - Classification Aware Neural Topic Model and its Application on a New COVID-19 Disinformation Corpus]]
     * [[https://arxiv.org/pdf/2108.10755.pdf|Cheevaprawatdomrong et al 2021 - More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models]]     * [[https://arxiv.org/pdf/2108.10755.pdf|Cheevaprawatdomrong et al 2021 - More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models]]
 +    * **[[https://arxiv.org/pdf/2203.05794.pdf|Grootendorst et al 2022 - BERTopic: Neural topic modeling with a class-based TF-IDF procedure]]**
 +  * Applications
 +    * [[https://aclanthology.org/P11-1098.pdf|Chambers & Jurafsky 2011 - Template-Based Information Extraction without the Templates]]
 +
 +===== Software =====
 +  * [[https://radimrehurek.com/gensim/|Gensim]] [[https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/|blog post]]
  
 ===== People ===== ===== People =====
   * [[https://scholar.google.com/citations?user=8OYE6iEAAAAJ&hl=en|David Blei]]   * [[https://scholar.google.com/citations?user=8OYE6iEAAAAJ&hl=en|David Blei]]
   * [[https://scholar.google.com/citations?user=4vvf4GIAAAAJ&hl=en|Alexandra Schofield]]   * [[https://scholar.google.com/citations?user=4vvf4GIAAAAJ&hl=en|Alexandra Schofield]]
 +
 +===== Related Pages =====
 +  * [[ml:Graphical Models]]
  
nlp/topic_modeling.1636697971.txt.gz · Last modified: 2023/06/15 07:36 (external edit)

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki