nlp:vision_and_language
This is an old revision of the document!
Table of Contents
Vision and Language
This page is about vision and language tasks that are distinct from visual question answering (which only deals with question answering) or grounded language learning (which includes a learning component to the task).
Overviews
Multimodal Foundation Models (Visual Language Models)
Multimodal Dialog Agents
- Overviews
- Diana
Navigation Tasks
See also this bibliography.
Multimodal Pretraining
Bibliographies
- Vision-and-Language A curated list of vision and language resources.
People
Related Pages
nlp/vision_and_language.1714467232.txt.gz · Last modified: 2024/04/30 08:53 by jmflanig