====== Autonomous Language Agents ====== LLM agents, etc. ===== Overviews ===== * See the related work of [[https://arxiv.org/pdf/2310.05915|Chen 2023]] for a nice overview. * [[https://arxiv.org/pdf/2308.11432.pdf|Wang et al 2023 - A Survey on Large Language Model based Autonomous Agents]] * [[https://github.com/Paitesanshi/LLM-Agent-Survey|LLM Agent Survey (github)]] - from the above survey, continuously updated * [[https://arxiv.org/pdf/2406.05804|Li et al 2025 - A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning]] * [[https://github.com/xinzhel/LLM-Agent-Survey|LLM Agent Survey (github)]] - from the above survey, continuously updated * [[https://arxiv.org/pdf/2309.07864.pdf|Xi et al 2023 - The Rise and Potential of Large Language Model Based Agents: A Survey]] * [[https://arxiv.org/pdf/2402.01680|Wang et al 2024 - Large Language Model based Multi-Agents: A Survey of Progress and Challenges]] * [[https://arxiv.org/pdf/2404.11584|Masterman et al 2024 - The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey]] * **Architectures** * [[https://arxiv.org/pdf/2309.02427|Sumers et al 2024 - Cognitive Architectures for Language Agents]] * **Memory Architectures** * [[https://arxiv.org/pdf/2501.13956|Rasmussen et al 2025 - Zep: A Temporal Knowledge Graph Architecture for Agent Memory]] * **Multi-Agents** * [[https://arxiv.org/pdf/2501.06322|Tran et al 2025 - Multi-Agent Collaboration Mechanisms: A Survey of LLMs]] * **Applications** * GUI Agents * [[https://arxiv.org/pdf/2411.04890|Wang et al 2024 - GUI Agents with Foundation Models: A Comprehensive Survey]] * [[https://arxiv.org/pdf/2411.18279|Zhang et al 2024 - Large Language Model-Brained GUI Agents: A Survey]] * Personal Agents * [[https://arxiv.org/pdf/2401.05459|Li et al 2024 - Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security]] ===== Papers ===== * **Key Method Papers** * [[https://arxiv.org/pdf/2210.03629.pdf|Yao et al 2022 - ReAct: Synergizing Reasoning and Acting in Language Models]] - The basis of LangChain. See the Webshop experiments section 4 and appendix D.3. * Follow-up work: [[https://arxiv.org/pdf/2402.00658|Jiao et al 2024 - Learning Planning-based Reasoning via Trajectories Collection and Process Reward Synthesizing]] * [[https://arxiv.org/pdf/2303.11366|Shinn et al 2023 - Reflexion: Language Agents with Verbal Reinforcement Learning]] * [[https://arxiv.org/pdf/2308.10144|Zhao et al 2023 - ExpeL: LLM Agents Are Experiential Learners]] * [[https://arxiv.org/pdf/2310.05915|Chen et al 2023 - FireAct: Toward Language Agent Fine-tuning]] Fine-tunes the LLM agent * CodeAct: [[https://arxiv.org/pdf/2402.01030|Wang et al 2024 - Executable Code Actions Elicit Better LLM Agents]] * AutoGPT: [[https://arxiv.org/pdf/2306.02224.pdf|Yang et al 2023 - Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions]] [[https://github.com/Significant-Gravitas/AutoGPT|github]] * [[https://arxiv.org/pdf/2309.07870.pdf|Zhou et al 2023 - Agents: An Open-source Framework for Autonomous Language Agents]] * [[https://arxiv.org/pdf/2403.12881|Chen et al 2023 - Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models]] * [[https://arxiv.org/pdf/2502.04644|Wu et al 2025 - Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research]] * [[https://openai.com/index/introducing-deep-research/|OpenAI 2025 - Deep Research]] * [[https://arxiv.org/pdf/2505.21963|Yano et al 2025 - LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents]] * [[https://arxiv.org/pdf/2505.22571|Pham et al 2025 - Agent-UniRAG: A Trainable Open-Source LLM Agent Framework for Unified Retrieval-Augmented Generation Systems]] * **Tool Use and Agent Skills** * "Agent Skills are instructions, scripts, and resources that agents can discover and use to do things more accurately and efficiently" (from [[https://agentskills.io/home|here]]) * [[https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview|Claude - Agent Skills]] (I believe "agent skills" was introduced in Claude) * [[https://agentskills.io/home|Agent Skills (website)]] * [[https://arxiv.org/pdf/2602.12670|Li et al 2026 - SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks]] * **Software Engineering (SWE) Agents** * See also [[Software Engineering]] * [[https://arxiv.org/pdf/2310.06770|Jimenez et al 2023 - SWE-bench: Can Language Models Resolve Real-World GitHub Issues?]] * [[https://arxiv.org/pdf/2405.15793|Yang et al 2024 - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering]] * [[https://arxiv.org/pdf/2505.23422|Lindenbauer et al 2025 - From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents]] * **Web Agents** * MiniWoB: [[https://proceedings.mlr.press/v70/shi17a/shi17a.pdf|Shi et al 2017 - World of Bits: An Open-Domain Platform for Web-Based Agents]] * MiniWoB++: [[https://arxiv.org/pdf/1802.08802|Liu et al 2018 - Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration]] [[https://miniwob.farama.org/index.html|MiniWoB++]] * [[https://arxiv.org/pdf/2307.13854|Zhou et al 2023 - WebArena: A Realistic Web Environment for Building Autonomous Agents]] * [[https://arxiv.org/pdf/2401.13649|Koh et al 2024 - VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks]] * [[https://arxiv.org/pdf/2412.05467|De Chezelles et al 2024 - The BrowserGym Ecosystem for Web Agent Research]] * **Mobile UI Agents** * [[https://arxiv.org/pdf/2404.05719|You et al 2024 - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs]] * **OS Agents** * [[https://arxiv.org/pdf/2402.07456|Wu et al 2024 - OS-Copilot: Towards Generalist Computer Agents with Self-Improvement]] * [[https://arxiv.org/pdf/2404.07972|Xie et al 2024 - OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments ]] ===== Multi-Agents ===== * **Overviews** * [[https://arxiv.org/pdf/2501.06322|Tran et al 2025 - Multi-Agent Collaboration Mechanisms: A Survey of LLMs]] ===== People ===== * [[https://scholar.google.com/citations?user=qJBXk9cAAAAJ&hl=en|Shunyu Yao]] [[https://ysymyth.github.io/|website]] ===== Related Pages ===== * [[ml:Computer Use Agents]] * [[Dialog]] * [[Language Model]] * [[Prompting]]