Visual Understanding

This part covers the researches related to visual understanding, which focuses on Crowd Counting, Visual Grounding, Video Grounding, and Temporal Action Localization (TAL). Crowd Counting is a task to count people in image. Different from object detection, Crowd Counting aims at recognizing arbitrarily sized targets in various situations including sparse and cluttering scenes at the same time. Visual Grounding aims to localize a visual region in the given image referred by a natural language query. Video Grounding aims locate the queried action or event in an untrimmed video based on rich linguistic descriptions.

Crowd Counting

DADNet: Dilated-Attention-Deformable ConvNet for Crowd Counting (Oral presentation)
Dan Guo, Kun Li*, Zheng-Jun Zha, Meng Wang.
ACM International Conference on Multimedia (ACM MM), 2019.

Temporal Action Detection

AOPNet: Anchor Offset Prediction Network for Temporal Action Proposal Generation
Fan Peng, Kun Li*, Xueliang Liu, and Dan Guo.
International Conference on Signal Processing, Communications and Computing (ICSPCC), 2020.

Video Grounding

ViGT: proposal-free video grounding with a learnable token in the transformer
Kun Li, Dan Guo*, Meng Wang*.
SCIENCES CHINA Information Sciences (SCIS), 2023.

Proposal-Free Video Grounding with Contextual Pyramid Network
Kun Li, Dan Guo*, Meng Wang*.
AAAI Conference on Artificial Intelligence (AAAI), 2021.

Proposal-free Video Grounding based on Motion Excitation
Yichen Guo, Kun Li, Dan Guo*.
Journal of Image and Graphics, 2023.

Spatiotemporal Contrastive Modeling for Video Moment Retrieval
Yi Wang, Kun Li*, Guoliang Chen*, Yan Zhang, Dan Guo, Meng Wang.
World Wide Web, 2023.

Visual Grounding

Transformer-based Visual Grounding with Cross-modality Interaction
Kun Li, Jiaxiu Li, Dan Guo*, Xun Yang*, Meng Wang*.
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2023

Human Beahavior Analysis

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification
Kun Li, Dan Guo*, Guoliang Chen, Xinge Peng, Meng Wang*.
MiGA@IJCAI23: International IJCAI Workshop on Micro-gesture Analysis for Hidden Emotion Understanding, 2023.

Data Augmentation for Human Behavior Analysis in Multi-Person Conversations
Kun Li, Dan Guo*, Guoliang Chen, Feiyang Liu, Meng Wang*.
ACM International Conference on Multimedia (ACM MM), 2023.