Differences

This shows you the differences between two versions of the page.

--- nlp:human-in-the-loop [2025/05/01 12:46] – [RLHF] jmflanig
+++ nlp:human-in-the-loop [2025/05/31 07:43] (current) – [RLHF] jmflanig
@@ Line 53: / Line 53: @@
   * [[https://arxiv.org/pdf/2306.01693.pdf|Wu et al 2023 - Fine-Grained Human Feedback Gives Better Rewards for Language Model Training]]
   * [[https://arxiv.org/pdf/2410.04612|Gao et al 2024 - Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF]]
+  * [[https://arxiv.org/pdf/2505.22338|Wang et al 2025 - Text2Grad: Reinforcement Learning from Natural Language Feedback]]
 === Crowdsourcing & Data Collection ===