nlp:large_reasoning_models

This is an old revision of the document!

Table of Contents

Large Reasoning Models

Large Reasoning Models

o1 or r1-style LLMs, often called “large reasoning models” (LRMs) (see Cuadron 2025)

Overviews

Xu et al 2025 - Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Kumar et al 2025 - LLM Post-Training: A Deep Dive into Reasoning Large Language Models
Sui et al 2025 - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Papers

OpenAI o1
- Learning to Reason with LLMs Has examples of the full reasoning chains.
- OpenAI o1 System Card arXiv (There is a lot of information to be gleaned about the training process if you read section 2 carefully.)
DeepSeek-AI et al 2025 - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- See also Reinforcement Learning with Verifiable Rewards
- R1 replication on small datasets
  - Zheng et al 2025 - 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
General papers
Concise Reasoning
- Using RL
  - Song & Zheng 2025 - Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Models
- Phi-4-Reasoning: Abdin et al 2025 - Phi-4-reasoning Technical Report

Related Pages

nlp/large_reasoning_models.1748503581.txt.gz · Last modified: 2025/05/29 07:26 by jmflanig