Differences

This shows you the differences between two versions of the page.

--- nlp:instruction-tuning [2025/05/29 07:06] – [Papers] jmflanig
+++ nlp:instruction-tuning [2025/06/01 22:58] (current) – [Papers] jmflanig
@@ Line 40: / Line 40: @@
     * [[https://arxiv.org/pdf/2311.09528|Wang et al 2023 - HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM]] Very high quality dataset (10k examples), better than 700K datasets that are not as good.
     * [[https://arxiv.org/pdf/2402.18571|Wang et al 2024 - Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards]]
+  * **Analyzing, Filtering, or Improving Preference Data**
+    * [[https://arxiv.org/pdf/2505.23114|Lee et al 2025 - Dataset Cartography for Large Language Model Alignment: Mapping and Diagnosing Preference Data]] Applies dataset cartography ([[https://arxiv.org/pdf/2009.10795|Swayamdipta 2020]]) to preference data
 ===== Datasets =====
@@ Line 61: / Line 63: @@
   * [[Alignment]]
   * [[Human-In-The-Loop]]
+  * [[ml:reinforcement_learning#Reinforcement Learning with Verifiable Rewards]]
   * [[human-in-the-loop#RLHF]]