====== Image and Video Captioning ====== ===== Overviews ===== * Image captioning * [[https://arxiv.org/pdf/1810.04020.pdf|Hossain et al 2018 - A Comprehensive Survey of Deep Learning for Image Captioning]] * [[https://ieeexplore.ieee.org/document/9087226|Sharma et al 2020 - Image Captioning: A Comprehensive Survey]] * Video captioning * [[https://arxiv.org/pdf/2101.06072.pdf|Apostolidis et al 2021 - Video Summarization Using Deep Neural Networks: A Survey]] ===== Image Captioning ===== See also this [[https://github.com/zhjohnchan/awesome-image-captioning|bibliography]]. * [[https://openaccess.thecvf.com/content_cvpr_2018/papers/Lu_Neural_Baby_Talk_CVPR_2018_paper.pdf|Lu et al 2018 - Neural Baby Talk]] ===== Video Captioning ===== * [[https://arxiv.org/pdf/1412.4729.pdf|Venugopalan et al 2014 - Translating Videos to Natural Language Using Deep Recurrent Neural Networks]] * [[https://arxiv.org/pdf/1505.00487.pdf|Venugopalan et al 2015 - Sequence to Sequence-Video to Text]] * [[https://dl.acm.org/doi/abs/10.1145/2733373.2806314|Li et al 2015 - Summarization-based Video Caption via Deep Neural Networks]] * [[https://www.csd.uoc.gr/~hy474/papers/VideoCaptionig.pdf|Gao et al 2017 - Video Captioning With Attention-Based LSTM and Semantic Consistency]] * [[https://arxiv.org/pdf/1711.11135.pdf|Wang et al 2017 - Video Captioning via Hierarchical Reinforcement Learning]] * [[https://arxiv.org/pdf/1412.4729.pdf|Venugopalan et al 2018 - Translating Videos to Natural Language Using Deep Recurrent Neural Networks]] ===== Related Pages ===== * [[Grounded Language Learning]]