====== Alignment (AI) ====== ===== Overviews ===== * [[https://arxiv.org/pdf/2309.15025.pdf|Shen et al 2023 - Large Language Model Alignment: A Survey]] * [[https://arxiv.org/pdf/2310.19852|Ji et al 2023 - AI Alignment: A Comprehensive Survey]] * [[https://arxiv.org/pdf/2404.09932|Anwar et al 2024 - Foundational Challenges in Assuring Alignment and Safety of Large Language Models]] ===== Blog Posts, etc ===== * [[https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is|2022 - What Everyone in Alignment is Doing and Why]] ===== Papers ===== * [[https://arxiv.org/pdf/2305.03047.pdf|Sun et al 2023 - Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision]] Wow, like the three laws of Asimov * [[https://arxiv.org/pdf/2305.11206.pdf|Zhou et al 2023 - LIMA: Less Is More for Alignment]] * **[[https://arxiv.org/pdf/2310.05910.pdf|Sun et al 2023 - SALMON: Self-Alignment with Principle-Following Reward Models]]** ===== Workshops, Conferences, and Websites ===== * [[https://www.alignmentforum.org/|AI Alignment Forum]] * [[https://www.matsprogram.org/|ML Alignment and Theory Scholars (MATS)]] ===== People ===== * [[https://scholar.google.com/citations?user=kV9XRxYAAAAJ&hl=en|Sam Bowman]] * [[https://scholar.google.com/citations?user=czyretsAAAAJ&hl=en|Dan Hendrycks]] ===== Related Pages ===== * [[AGI]] * [[Instruction-Tuning]] Instruction-tuning is often similar to alignment, but alignment is broader. Instruction-tuning methods often falls under alignment. * [[Language Model]] * [[LLM Safety]] * [[ml:Mechanistic Interpretability]]