Table of Contents

Alignment (AI)

Alignment (AI)

Overviews

Blog Posts, etc

2022 - What Everyone in Alignment is Doing and Why

Papers

Sun et al 2023 - Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision Wow, like the three laws of Asimov
Zhou et al 2023 - LIMA: Less Is More for Alignment
Sun et al 2023 - SALMON: Self-Alignment with Principle-Following Reward Models

Workshops, Conferences, and Websites

People

Related Pages

AGI
Instruction-Tuning Instruction-tuning is often similar to alignment, but alignment is broader. Instruction-tuning methods often falls under alignment.
Language Model
LLM Safety
Mechanistic Interpretability