User Tools

Site Tools


nlp:vision_and_language

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
nlp:vision_and_language [2025/06/05 21:47] – [Overviews] jmflanignlp:vision_and_language [2025/07/03 04:05] (current) – [Overviews] jmflanig
Line 6: Line 6:
   * **Multimodal Large Language Models (MLLMs)**   * **Multimodal Large Language Models (MLLMs)**
     * [[https://arxiv.org/pdf/2306.13549|Yin et al 2023 - A Survey on Multimodal Large Language Models]] Comprehensive [[https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models|github]] (continuously updated)     * [[https://arxiv.org/pdf/2306.13549|Yin et al 2023 - A Survey on Multimodal Large Language Models]] Comprehensive [[https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models|github]] (continuously updated)
 +    * [[https://arxiv.org/pdf/2501.02189|Li et al 2025 - A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges]]
     * For Visual QA:     * For Visual QA:
       * [[https://arxiv.org/pdf/2411.17558|Kuang et al 2024 - Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey]]       * [[https://arxiv.org/pdf/2411.17558|Kuang et al 2024 - Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey]]
nlp/vision_and_language.1749160072.txt.gz · Last modified: 2025/06/05 21:47 by jmflanig

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki