====== Visual Question Answering ====== ===== Papers ===== * [[https://arxiv.org/pdf/2210.02928.pdf|Chen et al 2022 - MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text]] * Neural Module Networks * [[https://openaccess.thecvf.com/content_cvpr_2016/papers/Andreas_Neural_Module_Networks_CVPR_2016_paper.pdf|Andreas et al 2017 - Neural Module Networks]] * [[https://openaccess.thecvf.com/content_ICCV_2017/papers/Hu_Learning_to_Reason_ICCV_2017_paper.pdf|Hu et al 2017 - Learning to Reason: End-to-End Module Networks for Visual Question Answering]] ===== Datasets ===== * Visual QA: [[https://visualqa.org/]] ===== People ===== * [[https://scholar.google.com/citations?user=dnZ8udEAAAAJ&hl=en|Jacob Andreas]] * [[https://scholar.google.com/citations?user=_bs7PqgAAAAJ&hl=en|Dhruv Batra]] ===== Related Pages ====== * [[Grounded Language Learning]] * [[Image Captioning]] * [[Question Answering]]