Table of Contents
Alignment (AI)
Overviews
Blog Posts, etc
Papers
Workshops, Conferences, and Websites
People
Related Pages
Alignment (AI)
Overviews
Shen et al 2023 - Large Language Model Alignment: A Survey
Ji et al 2023 - AI Alignment: A Comprehensive Survey
Anwar et al 2024 - Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Blog Posts, etc
2022 - What Everyone in Alignment is Doing and Why
Papers
Sun et al 2023 - Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Wow, like the three laws of Asimov
Zhou et al 2023 - LIMA: Less Is More for Alignment
Sun et al 2023 - SALMON: Self-Alignment with Principle-Following Reward Models
Workshops, Conferences, and Websites
AI Alignment Forum
ML Alignment and Theory Scholars (MATS)
People
Sam Bowman
Dan Hendrycks
Related Pages
AGI
Instruction-Tuning
Instruction-tuning is often similar to alignment, but alignment is broader. Instruction-tuning methods often falls under alignment.
Language Model
LLM Safety
Mechanistic Interpretability