Hi, I’m Haonan Zhang (张浩楠 in Chinese). I am currently a third-year Ph.D. student in Computer Science and Technology at the University of Electronic Science and Technology of China (UESTC), supervised by Prof. Lianli Gao and Prof. Jingkuan Song. I received my Bachelor’s degree in Computer Science and Technology from Xidian University in 2020. That same year, I joined UESTC for a Master’s degree and later transferred to the Ph.D. program in 2022. Currently, I am also a visiting Ph.D. student at the Multimedia and Human Understanding Group (MHUG), University of Trento (Italy), supervised by Prof. Nicu Sebe.

My research interests focus on Multi-modal Learning, LLMs. I am currently exploring the exciting and fast-growing field of Vision-Language-Action (VLA) models as part of my recent research.

For more information, please see my CV.

🔥 News

2025.09: 🎉🎉 Two papers are accepted by NeurIPS 2025.
2025.05: 🎉🎉 Two papers are accepted by ACL 2025.
2025.05: 🎉🎉 One paper is accepted by TIP 2025.
2024.07: 🎉🎉 One paper is accepted by TCSVT 2024.
2024.07: 🎉🎉 One paper is accepted by ACM Multimedia 2024.
2024.05: 🎉🎉 Join Tongyi Lab@Beijing for a summer internship.
2023.11: 🎉🎉 One paper is accepted by TCSVT 2023.
2023.07: 🎉🎉 Release Awesome-Embodied-Robotics-and-Agent, a curated list of “Embodied robotics or agent with Vision-Language Models (VLMs) and Large Language Models (LLMs)” research!

📝 Publications

* indicates equal contribution

OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis. NeurIPS 2025.
Run Luo, Ting-En Lin, Haonan Zhang, Yuchuan Wu, Xiong Liu, Min Yang, Yongbin Li, Longze Chen, Jiaming Li, Lei Zhang, Yangyi Chen, Hamid Alinejad-Rokny, Fei Huang
[Paper] [Code]
Bipolar Self-attention for Spiking Transformers. NeurIPS 2025 (spotlight).
Shuai Wang, Malu Zhang, Jingya Wang, Dehao Zhang, Yimeng Shan, Jieyuan Zhang, Yichen Xiao, Honglin Cao, Haonan Zhang, Zeyu Ma, Yang Yang, Haizhou Li
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction. ACL 2025.
Haonan Zhang, Run Luo, Xiong Liu, Yuchuan Wu, Ting-En Lin, Pengpeng Zeng, QIANG QU, Feiteng Fang, Min Yang, Lianli Gao, Jingkuan Song, Fei Huang, Yongbin Li
[Paper] [Code]
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct. ACL 2025（Findings）.
Run Luo*, Haonan Zhang*, Longze Chen*, Ting-En Lin*, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li
[Project Page] [Paper] [Code]
Text-Video Retrieval with Global-Local Semantic Consistent Learning. TIP 2025.
Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Heng Tao Shen
[Paper] [Code]
UMP: Unified Modality-aware Prompt Tuning for Text-Video Retrieval. TCSVT 2024.
Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen
[Paper] [Code]
MPT: Multi-grained Prompt Tuning for Text-Video Retrieval. ACM Multimedia 2024.
Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen
[Paper] [Code]
SPT: Spatial pyramid transformer for image captioning. TCSVT 2023.
Haonan Zhang, Pengpeng Zeng, Lianli Gao, Xinyu Lyu, Jingkuan Song, Heng Tao Shen
[Paper] [Code]
Depth-aware sparse transformer for video-language learning. ACM Multimedia 2023.
Haonan Zhang, Lianli Gao, Pengpeng Zeng, Alan Hanjalic, Heng Tao Shen
[Paper] [Code]
Learning visual question answering on controlled semantic noisy labels. PR 2023.
Haonan Zhang, Pengpeng Zeng, Yuxuan Hu, Jin Qian, Jingkuan Song, Lianli Gao
[Paper] [Code]
Video Question Answering with Prior Knowledge and Object-sensitive Learning. TIP 2022.
Pengpeng Zeng, Haonan Zhang, Lianli Gao, Jingkuan Song, Heng Tao Shen
[Paper] [Code]
S2 Transformer for image captioning. IJCAI 2022.
Pengpeng Zeng*, Haonan Zhang*, Jingkuan Song, and Lianli Gao
[Paper] [Code]

🎖 Honors and Scholarships

2025.03 “Academic Newcomer” Graduate Student Honor Award.
2024.12 National Scholarship.
2024.07 The Ninth-Place Winner in Challenge of Black-box Adversarial Attacks on Vision Foundation Models in CVPR 2024.
2024.05 Shenzhen Stock Exchange Scholarship.
2024.04 The 1^st Place Winner in Attribute Recognition track in MMVRAC, ICME 2024.
2023.11 First-class Scholarship.
2022.07 Outstanding Graduate Student Cadre.
2023.06 Outstanding Graduate Teaching Assistant Award.
2022.06 “Academic Youth” Graduate Student Honor Award.

📖 Educations

2022.09 - (now), University of Electronic Science and Technology of China, Chengdu, China, Ph.D. student of Computer Science and Technology.
2020.09 - 2022.06, University of Electronic Science and Technology of China, Chengdu, China, Master of Computer Technology, transferred to Ph.D.
2016.09 - 2020.06, Xidian University, Xian, China, Bachelor of Computer Science and Technology.

💻 Internships

2024.05 - 2025.04, Tongyi Lab, Alibaba Group, China.

💬 Services

Program Committee of AAAI25, Reviewer for CVPR23-25, ICCV25, ICLR25, ACM MM, etc.

Reviewer for TIP, TCSVT, TMM, etc.

visitors since Apr. 2025