User Tools

Site Tools


nlp:large_reasoning_models

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:large_reasoning_models [2025/05/31 07:57] jmflanignlp:large_reasoning_models [2025/10/10 09:05] (current) – [Papers] jmflanig
Line 10: Line 10:
 ===== Papers ===== ===== Papers =====
   * [[https://arxiv.org/pdf/2403.04642|Havrilla et al 2024 - Teaching Large Language Models to Reason with Reinforcement Learning]]   * [[https://arxiv.org/pdf/2403.04642|Havrilla et al 2024 - Teaching Large Language Models to Reason with Reinforcement Learning]]
-  * OpenAI o1+  * **OpenAI o1**
     * [[https://openai.com/index/learning-to-reason-with-llms/|Learning to Reason with LLMs]] Has examples of the full reasoning chains.     * [[https://openai.com/index/learning-to-reason-with-llms/|Learning to Reason with LLMs]] Has examples of the full reasoning chains.
     * [[https://cdn.openai.com/o1-system-card-20241205.pdf|OpenAI o1 System Card]] [[https://arxiv.org/pdf/2412.16720?|arXiv]] (There is a lot of information to be gleaned about the training process if you read section 2 carefully.)     * [[https://cdn.openai.com/o1-system-card-20241205.pdf|OpenAI o1 System Card]] [[https://arxiv.org/pdf/2412.16720?|arXiv]] (There is a lot of information to be gleaned about the training process if you read section 2 carefully.)
Line 17: Line 17:
     * R1 replication on small datasets     * R1 replication on small datasets
       * [[https://hkust-nlp.notion.site/simplerl-reason#18439bdc1c6b8083ba31f9cc912cf7f0|Zheng et al 2025 - 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient]]       * [[https://hkust-nlp.notion.site/simplerl-reason#18439bdc1c6b8083ba31f9cc912cf7f0|Zheng et al 2025 - 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient]]
-  * General papers+  * **General papers**
     * [[https://arxiv.org/pdf/2501.19393|Muennighoff et al 2025 - s1: Simple test-time scaling]]     * [[https://arxiv.org/pdf/2501.19393|Muennighoff et al 2025 - s1: Simple test-time scaling]]
     * [[https://arxiv.org/pdf/2502.03387|Ye et al 2025 - LIMO: Less is More for Reasoning]]     * [[https://arxiv.org/pdf/2502.03387|Ye et al 2025 - LIMO: Less is More for Reasoning]]
     * [[https://arxiv.org/pdf/2502.08235|2025 - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks]]     * [[https://arxiv.org/pdf/2502.08235|2025 - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks]]
 +    * [[https://arxiv.org/pdf/2502.12215|Zeng et al 2025 - Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?]]
     * [[https://arxiv.org/pdf/2503.14337|Yang et al 2025 - PENCIL: Long Thoughts with Short Memory]]     * [[https://arxiv.org/pdf/2503.14337|Yang et al 2025 - PENCIL: Long Thoughts with Short Memory]]
     * [[https://arxiv.org/pdf/2504.04022|Essential AI 2025 - Rethinking Reflection in Pre-Training]]     * [[https://arxiv.org/pdf/2504.04022|Essential AI 2025 - Rethinking Reflection in Pre-Training]]
Line 26: Line 27:
     * [[https://arxiv.org/pdf/2504.12329|Yang et al 2025 - Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time]]     * [[https://arxiv.org/pdf/2504.12329|Yang et al 2025 - Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time]]
     * [[http://arxiv.org/pdf/2504.13837|Yue et al 2025 - Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?]]     * [[http://arxiv.org/pdf/2504.13837|Yue et al 2025 - Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?]]
-    * [[https://arxiv.org/pdf/2505.07787|Luo et al 2025 - Learning from Peers in Reasoning Models]] 
     * [[https://arxiv.org/pdf/2505.16552|Tan et al 2025 - Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains]]     * [[https://arxiv.org/pdf/2505.16552|Tan et al 2025 - Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains]]
-  * Concise Reasoning+  * **Concise Reasoning**
     * Using RL     * Using RL
       * [[https://arxiv.org/pdf/2505.21178|Song & Zheng 2025 - Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning]]       * [[https://arxiv.org/pdf/2505.21178|Song & Zheng 2025 - Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning]]
-  * **Problems and Insights**+  * **Parallel and Collaborative Thinking** 
 +    * [[https://arxiv.org/pdf/2504.06261|Rodionov et al 2025 - Hogwild! Inference: Parallel LLM Generation via Concurrent Attention]] 
 +    * [[https://arxiv.org/pdf/2505.07787|Luo et al 2025 - Learning from Peers in Reasoning Models]] 
 +    * [[https://arxiv.org/pdf/2509.04475|Wen et al 2025 - ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute]] 
 +  * **Problems, Criticisms and Insights**
     * [[https://arxiv.org/pdf/2505.22756|Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach?]] "RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills"     * [[https://arxiv.org/pdf/2505.22756|Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach?]] "RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills"
-  * Models+    [[https://arxiv.org/pdf/2506.06941|Shojaee et al 2025 - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity]] 
 +    * **[[https://arxiv.org/pdf/2507.10532|Wu et al 2025 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination]]** Very important paper. "By auditing the MATH-500 dataset and introducing a clean benchmark, we demonstrate that Qwen’s successes with spurious reward were driven by memorization of benchmark problems rather than genuine reasoning skills." 
 +  * **Models**
     * Phi-4-Reasoning: [[https://arxiv.org/pdf/2504.21318|Abdin et al 2025 - Phi-4-reasoning Technical Report]]     * Phi-4-Reasoning: [[https://arxiv.org/pdf/2504.21318|Abdin et al 2025 - Phi-4-reasoning Technical Report]]
 +    * [[https://arxiv.org/pdf/2505.22375|Chen et al 2025 - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition]] Has a "fast" mode for routine queries and a deeper "slow" mode for complex inference
  
 ===== Related Pages ===== ===== Related Pages =====
-  * [[Reasoning Chains]]+  * [[Reasoning]] 
 +  * [[Reasoning#Reasoning Chains|Reasoning - Reasoning Chains]]
   * [[ml:reinforcement_learning#Reinforcement Learning with Verifiable Rewards]]   * [[ml:reinforcement_learning#Reinforcement Learning with Verifiable Rewards]]
   * [[Test-Time Scaling]]   * [[Test-Time Scaling]]
nlp/large_reasoning_models.1748678227.txt.gz · Last modified: 2025/05/31 07:57 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki