ml:reinforcement_learning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
ml:reinforcement_learning [2025/07/14 05:39] – [Reinforcement Learning with Verifiable Rewards] jmflanigml:reinforcement_learning [2025/07/14 05:40] (current) – [Reinforcement Learning with Verifiable Rewards] jmflanig
Line 36: Line 36:
  
  
-==== Reinforcement Learning with Verifiable Rewards ====+===== Reinforcement Learning with Verifiable Rewards =====
 DeepSeek-R1-Zero-style reinforcement learning is sometimes called **"reinforcement learning (RL) on verifiable rewards"** (see for example [[https://arxiv.org/pdf/2505.21493|Zhou 2025]]) or **"RL with outcome supervision."** DeepSeek-R1-Zero-style reinforcement learning is sometimes called **"reinforcement learning (RL) on verifiable rewards"** (see for example [[https://arxiv.org/pdf/2505.21493|Zhou 2025]]) or **"RL with outcome supervision."**
  
ml/reinforcement_learning.1752471581.txt.gz · Last modified: 2025/07/14 05:39 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki