Table of Contents

Large Language Model Safety

Large Language Model Safety

Overviews

Huang et al 2023 - A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Liu et al 2023 - Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Dong et al 2024 - Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Shi et al 2024 - Large Language Model Safety: A Holistic Survey Great survey
2025 - International AI Safety Report Safety for AI in general

Papers

Jailbraking LLMs

Related Pages