nlp:large_reasoning_models

Table of Contents

Large Reasoning Models

Large Reasoning Models

o1 or r1-style LLMs, often called “large reasoning models” (LRMs) (see Cuadron 2025)

Overviews

Xu et al 2025 - Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Kumar et al 2025 - LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Sui et al 2025 - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Papers

Havrilla et al 2024 - Teaching Large Language Models to Reason with Reinforcement Learning
OpenAI o1
- Learning to Reason with LLMs Has examples of the full reasoning chains.
- OpenAI o1 System Card arXiv (There is a lot of information to be gleaned about the training process if you read section 2 carefully.)
DeepSeek-AI et al 2025 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- See also Reinforcement Learning with Verifiable Rewards
- R1 replication on small datasets
  - Zheng et al 2025 - 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
General papers
Concise Reasoning
- Using RL
  - Song & Zheng 2025 - Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Parallel and Collaborative Thinking
Problems, Criticisms and Insights
- Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach? “RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills”
- Shojaee et al 2025 - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- Wu et al 2025 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Very important paper. “By auditing the MATH-500 dataset and introducing a clean benchmark, we demonstrate that Qwen’s successes with spurious reward were driven by memorization of benchmark problems rather than genuine reasoning skills.”
Models
- Phi-4-Reasoning: Abdin et al 2025 - Phi-4-reasoning Technical Report
- Chen et al 2025 - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition Has a “fast” mode for routine queries and a deeper “slow” mode for complex inference

Related Pages

nlp/large_reasoning_models.txt · Last modified: 2025/10/10 09:05 by jmflanig