User Tools

Site Tools


nlp:vision_and_language

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
nlp:vision_and_language [2025/06/05 21:44] – [Overviews] jmflanignlp:vision_and_language [2025/07/03 04:05] (current) – [Overviews] jmflanig
Line 6: Line 6:
   * **Multimodal Large Language Models (MLLMs)**   * **Multimodal Large Language Models (MLLMs)**
     * [[https://arxiv.org/pdf/2306.13549|Yin et al 2023 - A Survey on Multimodal Large Language Models]] Comprehensive [[https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models|github]] (continuously updated)     * [[https://arxiv.org/pdf/2306.13549|Yin et al 2023 - A Survey on Multimodal Large Language Models]] Comprehensive [[https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models|github]] (continuously updated)
 +    * [[https://arxiv.org/pdf/2501.02189|Li et al 2025 - A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges]]
     * For Visual QA:     * For Visual QA:
       * [[https://arxiv.org/pdf/2411.17558|Kuang et al 2024 - Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey]]       * [[https://arxiv.org/pdf/2411.17558|Kuang et al 2024 - Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey]]
 +    * Evaluation of MLLMs:
 +      * [[https://arxiv.org/pdf/2408.15769|Huang & Zhang 2024 - A Survey on Evaluation of Multimodal Large Language Models]]
  
 ===== Multimodal Foundation Models (Visual Language Models) ===== ===== Multimodal Foundation Models (Visual Language Models) =====
nlp/vision_and_language.1749159840.txt.gz · Last modified: 2025/06/05 21:44 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki