I am a fifth-year Ph.D. candidate at the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, where I pursue my doctoral studies under the supervision of Prof.Le Wang. I also collaborate closely with Prof.Sanping Zhou, Prof.Gang Hua and Prof.Wei Tang.

Previously, I received my B.Eng. degree in Robotics Engineering from Harbin Institute of Technology. I am currently a visiting Ph.D. student with the Multimedia and Human Understanding Group (MHUG) at the University of Trento, under the supervision of Prof.Nicu Sebe.

I am expecting to complete my PhD in 2026.09 and am currently seeking opportunities in Multimodal Large Language Model (MLLM) algorithms or applications. If you have any suitable openings, please feel free to reach out. My resume is attached here CV.

🔥 News

2026.03: 🎉🎉 One paper is accepted by TCSVT.
2026.02: 🎉🎉 One paper is accepted by CVPR 2026.
2025.10: 🎉🎉 One paper is accepted by AAAI 2026.
2025.07: 🎉🎉 One paper is accepted by ACM MM 2025.
2025.03: 🎉🎉 One paper is accepted by CVPR 2025.
2024.12: 🎉🎉 One paper is accepted by TMM.
2024.10: 🎉🎉 One paper is accepted by AAAI 2025.
2024.09: 🎉🎉 One paper is accepted by NIPS 2025.
2024.05: 🎉🎉 Two paper is accepted by TMM.
2024.03: 🎉🎉 One paper is accepted by CVPR 2024.
2023.03: 🎉🎉 One paper is accepted by CVPR 2023.

📖 Educations

2025.09 - now, Visiting Ph.D. student, Artificial Intelligence, University of Trento.
2021.09 - now, Ph.D. student, Control Science and Engineering, Xi’an Jiaotong University.
2017.09 - 2021.06, B.S., Robotics Engineering, Harbin Institute of Technology

💻 Internships

2025.03 - now, Ant Group, Multimodal Interaction Team, Research Intern in Multimodal Large Models.
2024.05 - 2025.02, Ant Group, Digital Human Algorithm Team, Research Intern in Digital Human Algorithms.
2022.05 - 2024.03, University of Illinois Chicago, Prof. Wei Tang’s Research Group, Research Intern in Computer Vision.

📝 Selected Publications

HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs

Zheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Yi Yuan, Jingdong Chen, Le Wang

AAAI26

Project Paper Code Huggingface

Versatile Multimodal Controls for Expressive Talking Human Animation

Zheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Zixin Zhu, Sanping Zhou, Ming Yang, Le Wang

ACM MM25

Project Paper

Towards generalizable multi-object tracking

Zheng Qin, Le Wang, Sanping Zhou, Panpan Fu, Gang Hua, Wei Tang

CVPR24

Video Paper Code

Motiontrack: Learning robust short-term and long-term motions for multi-object tracking

Zheng Qin, Sanping Zhou, Le Wang, Jinghai Duan, Gang Hua, Wei Tang

CVPR23

Video Paper Demo

RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation

Zheng Qin, Le Wang, Yabing Wang, Sanping Zhou, Gang Hua, Wei Tang

TCSVT

Paper Code Demo

Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion

Zheng Qin, Yabing Wang, Minghui Yang, Sanping Zhou, Ming Yang, Le Wang

Arxiv25

Paper Demo1 Demo2

RefDetector: A Simple yet Effective Matching-based Method for Referring Expression Comprehension

Yabing Wang, Zhuotao Tian, Zheng Qin, Sanping Zhou, Le Wang

AAAI25

Paper

Referencing Where to Focus: Improving Visual Grounding with Referential Query

Yabing Wang, Zhuotao Tian, Qingpei Guo, Zheng Qin, Sanping Zhou, Ming Yang, Le Wang

NIPS24

Paper Code

Towards precise embodied dialogue localization via causality guided diffusion

Haoyu Wang, Le Wang, Sanping Zhou, Jingyi Tian, Zheng Qin, Yabing Wang, Gang Hua, Wei Tang

CVPR25

Paper

Spatial Matters: Position-Guided 3D Referring Expression Segmentation

Wang Y, Tian Z, Wang L, Zheng Qin, Zhou S

CVPR26

Paper

Publications

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking (CVPR 2023)
Qin Z, Zhou S, Wang L, Duan J, Hua G, Tang W.
Towards Generalizable Multi-Object Tracking (CVPR 2024)
Qin Z, Wang L, Zhou S, Fu P, Hua G, Tang W.
Referencing Where to Focus: Improving Visual Grounding with Referential Query (NeurIPS 2024)
Wang Y, Tian Z, Guo Q, Qin Z, Zhou S, Yang M, Wang L.
RefDetector: A Simple yet Effective Matching-based Method for Referring Expression Comprehension (AAAI 2025)
Wang Y, Tian Z, Qin Z, Zhou S, Wang L.
Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion (CVPR 2025)
Wang H, Wang L, Qin Z, Wang Y, Hua G, Tang W.
Versatile Multimodal Controls for Whole-Body Talking Human Animation (ACM MM 2025)
Qin Z, Zheng R, Wang Y, Li T, Zhu Z, Yang M, Yang M, Wang L.
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs (AAAI 2026)
Qin Z, Zheng R, Wang Y, Li T, Yuan Y, Chen J, Wang L.
Spatial Matters: Position-Guided 3D Referring Expression Segmentation (CVPR 2026)
Wang Y, Tian Z, Wang L, Qin Z, Zhou S.
Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking (TMM 2024)
Li Y, Zhou S, Qin Z, Wang L, Wang J, Zheng N.
Robust Noisy Label Learning via Two-Stream Sample Distillation (TMM 2025)
Bai S, Zhou S, Qin Z, Wang L, Zheng N.
Semantic and Kinematics Guidance for RMOT (TMM 2025)
Li Y, Zhou S, Qin Z, Wang L.
RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation (TCSVT 2026)
Qin Z, Wang L, Wang Y, Zhou S, Hua G, Tang W.
Injecting Position and Relation Prior for Dense Video Captioning (Submitted to TIP)
Li Y, Zhou S, Qin Z, Lin J, Sun X, Wu K, Wang L.
From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval (Submitted to TCSVT)
Wang Y, Tian Z, Guo Q, Qin Z, Zhou S, Yang M, Wang L.
Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion (Submitted to TCSVT)
Qin Z, Wang L, Wang Y, Yang M, Rong C, Yang M, Zheng N.

🎖 Honors and Scholarships

National Scholarship (PhD), 2025
Weichai Power Scholarship (PhD), 2025
Outstanding Graduate Student (PhD), 2023, 2024
First-Class Freshman Scholarship, 2021
Outstanding Graduate, Harbin Institute of Technology, 2021
First-Class Academic Scholarship, 2018, 2019, 2020

💬 Invited Talks

2024.07, Invited to present the paper GeneralTrack at the “Summer of the Institute of Human–Computer Interaction, XJTU”
2023.07, Invited to present the paper MotionTrack at the “Summer of the Institute of Human–Computer Interaction, XJTU”
2023.06, Invited to present the paper MotionTrack at the CVPR 2023 paper-sharing session hosted by Microsoft Research Asia (MSRA Asia).

Services

Reviewer for CVPR, ICCV, ICML, ECCV, ICLR, NIPS, ACM MM, AAAI, TIP, TMM, PR, etc.