====== Trustworthy AI ======

===== Overviews =====
  * [[https://arxiv.org/pdf/2107.06641|Liu et al 2021 - Trustworthy AI: A Computational Perspective]]
  * [[https://arxiv.org/pdf/2110.01167|Li et al 2021 - Trustworthy AI: From Principles to Practices]]
  * [[https://dl.acm.org/doi/10.1145/3491209|Kaur et al 2022 - Trustworthy Artificial Intelligence: A Review]]
  * [[https://arxiv.org/pdf/2306.00380|Wu et al 2023 - Survey of Trustworthy AI: A Meta Decision of AI]]
  * **LLMs**
    * [[https://arxiv.org/abs/2308.05374|Liu et al 2023 - Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment]]

===== Papers =====
  * [[https://arxiv.org/pdf/2010.07487.pdf|Jacovi et al 2020 - Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI]]

===== LLM Trustworthiness =====
{{media:llm_trustworthy_pillars.png}}\\
Figure from [[https://trustllm.ai/|here]].

===== Related Pages =====
  * [[nlp:Explainability]]
  * [[nlp:Hallucination and Factivity]]
  * [[Mechanistic Interpretability]]