This part covers the researches related to visual captioning, including image captioning and video captioning. To releax the reliance on paired image-sentence data for image captioning training,unsupervised captioning with no annotations is explored through two-stage memory mechanisms. A GAN based method is proposed for exploring implicit semantic correlation between disjointed images and sentences through building a multimoda semantic aware space. Beyond objective descriptions, recent works focues on emotional video captioning that reveals the emotion state evoked by video and depict the video content more semantically richly.
|
Contextual Attention Network for Emotional Video Captioning Peipei Song, Dan Guo*, Jun Cheng, and Meng Wang* IEEE Transactions on Multimedia, 2022 [Paper] [BibTex] |
|
Memorial GAN with Joint Semantic Optimization for Unpaired Image Captioning Peipei Song, Dan Guo*, Jinxing Zhou, Mingliang Xu, and Meng Wang* IEEE Transactions on Cybernetics, 2022 [Paper] [BibTex] |
|
引入语义匹配和语言评价的跨语言图像描述 张静, 郭丹*, 宋培培*, 李坤, 汪萌 中国图象图形学报, 2021 [Paper] [BibTex] |
|
Recurrent Relational Memory Network for Unsupervised Image Captioning Dan Guo, Yang Wang*, Peipei Song* and Meng Wang International Joint Conference on Artificial Intelligence (IJCAI), 2020 [Paper] [BibTex] |
|
Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning Yuling Gui, Dan Guo and Ye Zhao Workshop on Multimedia for Accessible Human Computer Interfaces (MAHCI), 2019 [Paper] [BibTex] |
|
Waiting for Update...