Table of Contents
Large Reasoning Models
Overviews
Papers
Related Pages
Large Reasoning Models
o1 or r1-style LLMs, often called “large reasoning models” (LRMs) (see
Cuadron 2025
)
Overviews
Xu et al 2025 - Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Kumar et al 2025 - LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Sui et al 2025 - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Papers
Havrilla et al 2024 - Teaching Large Language Models to Reason with Reinforcement Learning
OpenAI o1
Learning to Reason with LLMs
Has examples of the full reasoning chains.
OpenAI o1 System Card
arXiv
(There is a lot of information to be gleaned about the training process if you read section 2 carefully.)
DeepSeek-AI et al 2025 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
See also
Reinforcement Learning with Verifiable Rewards
R1 replication on small datasets
Zheng et al 2025 - 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
General papers
Muennighoff et al 2025 - s1: Simple test-time scaling
Ye et al 2025 - LIMO: Less is More for Reasoning
2025 - The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
Zeng et al 2025 - Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Yang et al 2025 - PENCIL: Long Thoughts with Short Memory
Essential AI 2025 - Rethinking Reflection in Pre-Training
Ma et al 2025 - Reasoning Models Can Be Effective Without Thinking
Yang et al 2025 - Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Yue et al 2025 - Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Tan et al 2025 - Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Concise Reasoning
Using RL
Song & Zheng 2025 - Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Parallel and Collaborative Thinking
Rodionov et al 2025 - Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Luo et al 2025 - Learning from Peers in Reasoning Models
Wen et al 2025 - ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Problems, Criticisms and Insights
Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
“RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills”
Shojaee et al 2025 - The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Wu et al 2025 - Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
Very important paper. “By auditing the MATH-500 dataset and introducing a clean benchmark, we demonstrate that Qwen’s successes with spurious reward were driven by memorization of benchmark problems rather than genuine reasoning skills.”
Models
Phi-4-Reasoning:
Abdin et al 2025 - Phi-4-reasoning Technical Report
Chen et al 2025 - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition
Has a “fast” mode for routine queries and a deeper “slow” mode for complex inference
Related Pages
Reasoning
Reasoning - Reasoning Chains
Reinforcement Learning with Verifiable Rewards
Test-Time Scaling