Table of Contents

Vision and Language

This page is about vision and language tasks that are distinct from visual question answering (which only deals with question answering) or grounded language learning (which includes a learning component to the task).

Overviews

Multimodal Foundation Models (Visual Language Models)

Prompting Methods

Multimodal Dialog Agents

See also this bibliography.

Multimodal Pretraining

See also Awesome Vision & Language Pretraining Papers.

Bibliographies

People