nlp:vision_and_language
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| nlp:vision_and_language [2024/04/30 08:53] – [Multimodal Foundation Models (Visual Language Models)] jmflanig | nlp:vision_and_language [2025/07/03 04:05] (current) – [Overviews] jmflanig | ||
|---|---|---|---|
| Line 4: | Line 4: | ||
| ===== Overviews ===== | ===== Overviews ===== | ||
| * [[https:// | * [[https:// | ||
| + | * **Multimodal Large Language Models (MLLMs)** | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * For Visual QA: | ||
| + | * [[https:// | ||
| + | * Evaluation of MLLMs: | ||
| + | * [[https:// | ||
| ===== Multimodal Foundation Models (Visual Language Models) ===== | ===== Multimodal Foundation Models (Visual Language Models) ===== | ||
| Line 12: | Line 19: | ||
| * LLaVA: **[[https:// | * LLaVA: **[[https:// | ||
| * [[https:// | * [[https:// | ||
| + | * [[https:// | ||
| * [[https:// | * [[https:// | ||
| * [[https:// | * [[https:// | ||
| + | * [[https:// | ||
| + | |||
| + | ==== Prompting Methods ==== | ||
| + | * [[https:// | ||
| ===== Multimodal Dialog Agents ===== | ===== Multimodal Dialog Agents ===== | ||
| Line 53: | Line 65: | ||
| ===== Related Pages ===== | ===== Related Pages ===== | ||
| * [[robotics: | * [[robotics: | ||
| + | * [[Grounding]] | ||
| * [[Grounded Language Learning]] | * [[Grounded Language Learning]] | ||
| * [[Image Captioning]] | * [[Image Captioning]] | ||
nlp/vision_and_language.1714467232.txt.gz · Last modified: 2024/04/30 08:53 by jmflanig