Differences

This shows you the differences between two versions of the page.

--- nlp:paper_examples [2025/05/31 18:37] – [Not-So-Good Examples] jmflanig
+++ nlp:paper_examples [2025/05/31 18:40] (current) – jmflanig
@@ Line 12: / Line 12: @@
   * [[https://arxiv.org/pdf/1706.03762.pdf|Vaswani et al 2017 - Attention Is All You Need]]  The main issue with the paper is it doesn't give any ablation studies or experimental justification of why they did the things they did.  Even if they didn't have the computing resources to do ablation studies, they still could have shown the incremental improvements as they added improvements (for example, as in [[https://arxiv.org/pdf/1504.06665.pdf|this paper]])  The paper is also written in a way that is hard to follow: slowly revealing the architecture without giving a good high-level overview at the beginning. It also doesn't clearly specify which parts of the architecture are novel (they invented), and which parts are from somewhere else.  For example, they invented multi-head attention, but they don't clearly state this.
-  * [[https://arxiv.org/pdf/2501.12948|DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]] This paper is well-written, but has a major flaw: it is written ignoring prior work for what they have done.  There is no related work section, and besides citing their own work, almost no citations in main body of the paper (Section 2: Approach).  They are missing important prior work such as [[https://arxiv.org/pdf/2403.04642|Havrilla - Teaching Large Language Models to Reason with Reinforcement Learning]].  This incorrectly makes it look like they invented everything.
+  * [[https://arxiv.org/pdf/2501.12948|DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning]] This paper is well-written, but has a major flaw: it is written ignoring prior work for what they have done.  There is no related work section, and besides citing their own work, no citations in main body of the paper (Section 2: Approach).  They are missing important prior work such as [[https://arxiv.org/pdf/2403.04642|Havrilla - Teaching Large Language Models to Reason with Reinforcement Learning]].  This incorrectly makes it look like they invented everything.