User Tools

Site Tools


nlp:large_reasoning_models

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:large_reasoning_models [2025/06/14 00:44] – [Papers] jmflanignlp:large_reasoning_models [2025/10/10 09:05] (current) – [Papers] jmflanig
Line 21: Line 21:
     * [[https://arxiv.org/pdf/2502.03387|Ye et al 2025 - LIMO: Less is More for Reasoning]]     * [[https://arxiv.org/pdf/2502.03387|Ye et al 2025 - LIMO: Less is More for Reasoning]]
     * [[https://arxiv.org/pdf/2502.08235|2025 - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks]]     * [[https://arxiv.org/pdf/2502.08235|2025 - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks]]
 +    * [[https://arxiv.org/pdf/2502.12215|Zeng et al 2025 - Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?]]
     * [[https://arxiv.org/pdf/2503.14337|Yang et al 2025 - PENCIL: Long Thoughts with Short Memory]]     * [[https://arxiv.org/pdf/2503.14337|Yang et al 2025 - PENCIL: Long Thoughts with Short Memory]]
     * [[https://arxiv.org/pdf/2504.04022|Essential AI 2025 - Rethinking Reflection in Pre-Training]]     * [[https://arxiv.org/pdf/2504.04022|Essential AI 2025 - Rethinking Reflection in Pre-Training]]
Line 33: Line 34:
     * [[https://arxiv.org/pdf/2504.06261|Rodionov et al 2025 - Hogwild! Inference: Parallel LLM Generation via Concurrent Attention]]     * [[https://arxiv.org/pdf/2504.06261|Rodionov et al 2025 - Hogwild! Inference: Parallel LLM Generation via Concurrent Attention]]
     * [[https://arxiv.org/pdf/2505.07787|Luo et al 2025 - Learning from Peers in Reasoning Models]]     * [[https://arxiv.org/pdf/2505.07787|Luo et al 2025 - Learning from Peers in Reasoning Models]]
 +    * [[https://arxiv.org/pdf/2509.04475|Wen et al 2025 - ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute]]
   * **Problems, Criticisms and Insights**   * **Problems, Criticisms and Insights**
     * [[https://arxiv.org/pdf/2505.22756|Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach?]] "RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills"     * [[https://arxiv.org/pdf/2505.22756|Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach?]] "RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills"
-    * [[https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf|Shojaee et al 2025 - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]]+    * [[https://arxiv.org/pdf/2506.06941|Shojaee et al 2025 - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]] 
 +    * **[[https://arxiv.org/pdf/2507.10532|Wu et al 2025 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination]]** Very important paper. "By auditing the MATH-500 dataset and introducing a clean benchmark, we demonstrate that Qwen’s successes with spurious reward were driven by memorization of benchmark problems rather than genuine reasoning skills."
   * **Models**   * **Models**
     * Phi-4-Reasoning: [[https://arxiv.org/pdf/2504.21318|Abdin et al 2025 - Phi-4-reasoning Technical Report]]     * Phi-4-Reasoning: [[https://arxiv.org/pdf/2504.21318|Abdin et al 2025 - Phi-4-reasoning Technical Report]]
nlp/large_reasoning_models.1749861874.txt.gz · Last modified: 2025/06/14 00:44 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki