nlp:large_reasoning_models
This is an old revision of the document!
Table of Contents
Large Reasoning Models
o1 or r1-style LLMs, often called “large reasoning models” (LRMs) (see Cuadron 2025)
Overviews
Papers
- OpenAI o1
- Learning to Reason with LLMs Has examples of the full reasoning chains.
- OpenAI o1 System Card arXiv (There is a lot of information to be gleaned about the training process if you read section 2 carefully.)
-
- R1 replication on small datasets
- General papers
- Concise Reasoning
- Using RL
- Parallel and Collaborative Thinking
- Problems, Criticisms and Insights
- Qin et al 2025 - Decomposing Elements of Problem Solving: What "Math" Does RL Teach? “RL-trained models struggle with fundamentally new problems, hitting a ‘coverage wall’ due to insufficient planning skills”
- Models
- Phi-4-Reasoning: Abdin et al 2025 - Phi-4-reasoning Technical Report
- Chen et al 2025 - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition Has a “fast” mode for routine queries and a deeper “slow” mode for complex inference
Related Pages
nlp/large_reasoning_models.1749861874.txt.gz · Last modified: 2025/06/14 00:44 by jmflanig