Differences

This shows you the differences between two versions of the page.

--- nlp:paper_examples [2022/06/24 01:38] – [Not-So-Good Examples] jmflanig
+++ nlp:paper_examples [2025/05/31 18:40] (current) – jmflanig
@@ Line 1: / Line 1: @@
 ====== Examples of Good Papers =====
-This is a list of some good papers in NLP.
+This is a list of some good (well-written) papers in NLP.
   * Introduction of a new method
-    *
+    * [[https://arxiv.org/pdf/1409.0473.pdf|Bahdanau et al 2014 - Neural Machine Translation by Jointly Learning to Align and Translate]]
+    * [[https://arxiv.org/pdf/2203.08568.pdf|Hu et al 2022 - In-Context Learning for Few-Shot Dialogue State Tracking]]
   * Systematic exploration papers
     * [[https://arxiv.org/pdf/1804.06323.pdf|Qi et al 2018 - When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?]]
@@ Line 10: / Line 11: @@
 Here are some papers that perhaps could have been better presented.
-  * [[https://arxiv.org/pdf/1706.03762.pdf|Vaswani et al 2017 - Attention Is All You Need]]  The main issue with the paper is it doesn't give any ablation studies or justification of why they did the things the way they did.  Even if they didn't have the computing resources to do ablation studies, they still could have shown the incremental improvements as they added improvements (for example, as in [[https://arxiv.org/pdf/1504.06665.pdf|this paper]])  The paper is also written in a way that is hard to follow: slowly revealing the architecture without giving a good high-level overview at the beginning.
+  * [[https://arxiv.org/pdf/1706.03762.pdf|Vaswani et al 2017 - Attention Is All You Need]]  The main issue with the paper is it doesn't give any ablation studies or experimental justification of why they did the things they did.  Even if they didn't have the computing resources to do ablation studies, they still could have shown the incremental improvements as they added improvements (for example, as in [[https://arxiv.org/pdf/1504.06665.pdf|this paper]])  The paper is also written in a way that is hard to follow: slowly revealing the architecture without giving a good high-level overview at the beginning. It also doesn't clearly specify which parts of the architecture are novel (they invented), and which parts are from somewhere else.  For example, they invented multi-head attention, but they don't clearly state this.
+  * [[https://arxiv.org/pdf/2501.12948|DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]] This paper is well-written, but has a major flaw: it is written ignoring prior work for what they have done.  There is no related work section, and besides citing their own work, no citations in main body of the paper (Section 2: Approach).  They are missing important prior work such as [[https://arxiv.org/pdf/2403.04642|Havrilla - Teaching Large Language Models to Reason with Reinforcement Learning]].  This incorrectly makes it look like they invented everything.