nlp:large_reasoning_models
Table of Contents
Large Reasoning Models
o1 or r1-style LLMs, often called “large reasoning models” (LRMs) (see Cuadron 2025)
Overviews
Papers
- OpenAI o1
- Learning to Reason with LLMs Has examples of the full reasoning chains.
- OpenAI o1 System Card arXiv (There is a lot of information to be gleaned about the training process if you read section 2 carefully.)
-
- R1 replication on small datasets
- General papers
- Concise Reasoning
- Using RL
- Parallel and Collaborative Thinking
- Problems, Criticisms and Insights
- Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach? “RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills”
- Wu et al 2025 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Very important paper. “By auditing the MATH-500 dataset and introducing a clean benchmark, we demonstrate that Qwen’s successes with spurious reward were driven by memorization of benchmark problems rather than genuine reasoning skills.”
- Models
- Phi-4-Reasoning: Abdin et al 2025 - Phi-4-reasoning Technical Report
- Chen et al 2025 - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition Has a “fast” mode for routine queries and a deeper “slow” mode for complex inference
Related Pages
nlp/large_reasoning_models.txt · Last modified: 2025/10/10 09:05 by jmflanig