I am a fifth-year Ph.D. candidate at the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, where I pursue my doctoral studies under the supervision of Prof.Le Wang. I also collaborate closely with Prof.Sanping Zhou, Prof.Gang Hua and Prof.Wei Tang.
Previously, I received my B.Eng. degree in Robotics Engineering from Harbin Institute of Technology. I am currently a visiting Ph.D. student with the Multimedia and Human Understanding Group (MHUG) at the University of Trento, under the supervision of Prof.Nicu Sebe.
I am expecting to complete my PhD in 2026.09 and am currently seeking opportunities in Multimodal Large Language Model (MLLM) algorithms or applications. If you have any suitable openings, please feel free to reach out. My resume is attached here CV.
🔥 News
- 2026.03: 🎉🎉 One paper is accepted by TCSVT.
- 2026.02: 🎉🎉 One paper is accepted by CVPR 2026.
- 2025.10: 🎉🎉 One paper is accepted by AAAI 2026.
- 2025.07: 🎉🎉 One paper is accepted by ACM MM 2025.
- 2025.03: 🎉🎉 One paper is accepted by CVPR 2025.
- 2024.12: 🎉🎉 One paper is accepted by TMM.
- 2024.10: 🎉🎉 One paper is accepted by AAAI 2025.
- 2024.09: 🎉🎉 One paper is accepted by NIPS 2025.
- 2024.05: 🎉🎉 Two paper is accepted by TMM.
- 2024.03: 🎉🎉 One paper is accepted by CVPR 2024.
- 2023.03: 🎉🎉 One paper is accepted by CVPR 2023.
📖 Educations
- 2025.09 - now, Visiting Ph.D. student, Artificial Intelligence, University of Trento.
- 2021.09 - now, Ph.D. student, Control Science and Engineering, Xi’an Jiaotong University.
- 2017.09 - 2021.06, B.S., Robotics Engineering, Harbin Institute of Technology
💻 Internships
- 2025.03 - now, Ant Group, Multimodal Interaction Team, Research Intern in Multimodal Large Models.
- 2024.05 - 2025.02, Ant Group, Digital Human Algorithm Team, Research Intern in Digital Human Algorithms.
- 2022.05 - 2024.03, University of Illinois Chicago, Prof. Wei Tang’s Research Group, Research Intern in Computer Vision.
📝 Selected Publications
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs
RefDetector: A Simple yet Effective Matching-based Method for Referring Expression Comprehension
Publications
-
MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking (CVPR 2023)
Qin Z, Zhou S, Wang L, Duan J, Hua G, Tang W. -
Towards Generalizable Multi-Object Tracking (CVPR 2024)
Qin Z, Wang L, Zhou S, Fu P, Hua G, Tang W. -
Referencing Where to Focus: Improving Visual Grounding with Referential Query (NeurIPS 2024)
Wang Y, Tian Z, Guo Q, Qin Z, Zhou S, Yang M, Wang L. -
RefDetector: A Simple yet Effective Matching-based Method for Referring Expression Comprehension (AAAI 2025)
Wang Y, Tian Z, Qin Z, Zhou S, Wang L. -
Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion (CVPR 2025)
Wang H, Wang L, Qin Z, Wang Y, Hua G, Tang W. -
Versatile Multimodal Controls for Whole-Body Talking Human Animation (ACM MM 2025)
Qin Z, Zheng R, Wang Y, Li T, Zhu Z, Yang M, Yang M, Wang L. -
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs (AAAI 2026)
Qin Z, Zheng R, Wang Y, Li T, Yuan Y, Chen J, Wang L. -
Spatial Matters: Position-Guided 3D Referring Expression Segmentation (CVPR 2026)
Wang Y, Tian Z, Wang L, Qin Z, Zhou S. -
Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking (TMM 2024)
Li Y, Zhou S, Qin Z, Wang L, Wang J, Zheng N. -
Robust Noisy Label Learning via Two-Stream Sample Distillation (TMM 2025)
Bai S, Zhou S, Qin Z, Wang L, Zheng N. -
Semantic and Kinematics Guidance for RMOT (TMM 2025)
Li Y, Zhou S, Qin Z, Wang L. -
RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation (TCSVT 2026)
Qin Z, Wang L, Wang Y, Zhou S, Hua G, Tang W. -
Injecting Position and Relation Prior for Dense Video Captioning (Submitted to TIP)
Li Y, Zhou S, Qin Z, Lin J, Sun X, Wu K, Wang L. -
From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval (Submitted to TCSVT)
Wang Y, Tian Z, Guo Q, Qin Z, Zhou S, Yang M, Wang L. -
Embracing Aleatoric Uncertainty: Generating Diverse 3D Human Motion (Submitted to TCSVT)
Qin Z, Wang L, Wang Y, Yang M, Rong C, Yang M, Zheng N.
🎖 Honors and Scholarships
- National Scholarship (PhD), 2025
- Weichai Power Scholarship (PhD), 2025
- Outstanding Graduate Student (PhD), 2023, 2024
- First-Class Freshman Scholarship, 2021
- Outstanding Graduate, Harbin Institute of Technology, 2021
- First-Class Academic Scholarship, 2018, 2019, 2020
💬 Invited Talks
- 2024.07, Invited to present the paper GeneralTrack at the “Summer of the Institute of Human–Computer Interaction, XJTU”
- 2023.07, Invited to present the paper MotionTrack at the “Summer of the Institute of Human–Computer Interaction, XJTU”
- 2023.06, Invited to present the paper MotionTrack at the CVPR 2023 paper-sharing session hosted by Microsoft Research Asia (MSRA Asia).
Services
Reviewer for CVPR, ICCV, ICML, ECCV, ICLR, NIPS, ACM MM, AAAI, TIP, TMM, PR, etc.