Differences

This shows you the differences between two versions of the page.

--- nlp:large_reasoning_models [2025/10/08 08:57] – [Papers] jmflanig
+++ nlp:large_reasoning_models [2025/10/10 09:05] (current) – [Papers] jmflanig
@@ Line 37: / Line 37: @@
   * **Problems, Criticisms and Insights**
     * [[https://arxiv.org/pdf/2505.22756|Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach?]] "RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills"
-    * [[https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf|Shojaee et al 2025 - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]]
+    * [[https://arxiv.org/pdf/2506.06941|Shojaee et al 2025 - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]]
-    * **[[https://arxiv.org/pdf/2507.10532|Wu et al 2025 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination]]** Very important paper
+    * **[[https://arxiv.org/pdf/2507.10532|Wu et al 2025 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination]]** Very important paper. "By auditing the MATH-500 dataset and introducing a clean benchmark, we demonstrate that Qwen’s successes with spurious reward were driven by memorization of benchmark problems rather than genuine reasoning skills."
   * **Models**
     * Phi-4-Reasoning: [[https://arxiv.org/pdf/2504.21318|Abdin et al 2025 - Phi-4-reasoning Technical Report]]