This part covers the researches related to visual dialog and video question answering. Visual dialog is a multi-round extension for visual question answering (VQA). The interactions between the image and multi-round question answer pairs are progressively changing, and the relationships among the objects in the image are influenced by the current question. Video question answering task aims to make the model capable of answering the question reffering to a video, which requires both appearance and motion information and is still difficult to establish the complex semantic connections between textual and various visual information. The key to the above two tasks is how to effectively realize relation reasoning. The current research is mainly based on graph neural network and memory network.
|
Pairwise VLAD Interaction Network for Video Question Answering Hui Wang, Dan Guo, Xiansheng Hua, and Meng Wang ACM International Conference on Multimedia (ACM MM), 2021 [Paper] [BibTex] |
|
Context-Aware Graph Inference with Knowledge Distillation for Visual Dialog Dan Guo, Hui Wang, and Meng Wang IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 2021 [Paper] [BibTex] |
|
Iterative Context-Aware Graph Inference for Visual Dialog Dan Guo, Hui Wang, Hanwang Zhang, Zhengjun Zha, and Meng Wang Conference on Computer Vision and Pattern Recognition (CVPR), 2020 [Paper] [BibTex] |
|
Textual-Visual Reference-Aware Attention Network for Visual Dialog Dan Guo, Hui Wang, Shuhui Wang, and Meng Wang IEEE Transactions on Image Processing (TIP), 2020 [Paper] [BibTex] |
|
Dual Visual Attention Network for Visual Dialog Dan Guo, Hui Wang, and Meng Wang International Joint Conference on Artificial Intelligence (IJCAI), 2019 [Paper] [BibTex] |
|
Waiting for Update...