nlp:human-in-the-loop
Table of Contents
Human-In-The-Loop, RLHF and Interactive Methods
Overviews
See also Interactive NLP Workshop - References and Awesome RLHF
- Blog posts
General Papers
- Interactive AI Model Debugging and Correction (2022 Thesis) (talk)
- VAL: Interactive Task Learning with GPT Dialog Parsing This is in an HCI conference
Classification
Semantic Parsing
Machine Translation
Evaluation Tasks
RLHF
RLHF: Reinforcement Learning with Human Feedback. This definition is quite general, and would apply to any method of reinforcement learning with human feedback. Actually RLHF usually refers to a specific method of doing this. For a quick a overview, see section 3 of Rafailov 2023.
Overviews
- Quick overview: section 3 of Rafailov 2023 or section of 1.1 of Chowdhury 2024
- Chaudhari et al 2024 - RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs Really good explanation of PPO in section 6 (why each piece is necessary)
- Lambert - Reinforcement Learning from Human Feedback Nathan's RLHF book. Very good website
Papers
- Christiano et al 2017 - Deep Reinforcement Learning from Human Preferences Introduced the setup of RLHF
- Ziegler et al 2019 - Fine-Tuning Language Models from Human Preferences Early RLHF paper, before it was called RLHF
- Askell et al 2021 - A General Language Assistant as a Laboratory for Alignment Introduced the acronym RLHF
- InstructGPT: Ouyang et al 2022 - Training language models to follow instructions with human feedback This is essentially inverse-reinforcement learning (such as this) applied to LMs. Background papers:
Crowdsourcing & Data Collection
Conferences and Workshops
People
Related Pages
nlp/human-in-the-loop.txt · Last modified: 2025/05/31 07:43 by jmflanig