nlp:vision_and_language
Table of Contents
Vision and Language
This page is about vision and language tasks that are distinct from visual question answering (which only deals with question answering) or grounded language learning (which includes a learning component to the task).
Overviews
- Multimodal Large Language Models (MLLMs)
- Yin et al 2023 - A Survey on Multimodal Large Language Models Comprehensive github (continuously updated)
- For Visual QA:
- Evaluation of MLLMs:
Multimodal Foundation Models (Visual Language Models)
Prompting Methods
Multimodal Dialog Agents
- Overviews
- Diana
Navigation Tasks
See also this bibliography.
Multimodal Pretraining
Bibliographies
- Vision-and-Language A curated list of vision and language resources.
People
Related Pages
nlp/vision_and_language.txt · Last modified: 2025/07/03 04:05 by jmflanig