Hi, I’m Haonan Zhang (张浩楠 in Chinese). I am currently a third-year Ph.D. student in Computer Science and Technology at the University of Electronic Science and Technology of China (UESTC), supervised by Prof. Lianli Gao and Prof. Jingkuan Song. I received my Bachelor’s degree in Computer Science and Technology from Xidian University in 2020. That same year, I joined UESTC for a Master’s degree and later transferred to the Ph.D. program in 2022. Currently, I am also a visiting Ph.D. student at the Multimedia and Human Understanding Group (MHUG), University of Trento (Italy), supervised by Prof. Nicu Sebe.

My research interests focus on Multi-modal Learning, LLMs. I am currently exploring the exciting and fast-growing field of Vision-Language-Action (VLA) models as part of my recent research.

For more information, please see my CV.

🔥 News

  • 2025.05:  🎉🎉 Two papers are accepted by ACL 2025.
  • 2025.05:  🎉🎉 One paper is accepted by TIP 2025.
  • 2024.07:  🎉🎉 One paper is accepted by TCSVT 2024.
  • 2024.07:  🎉🎉 One paper is accepted by ACM Multimedia 2024.
  • 2024.05:  🎉🎉 Join Tongyi Lab@Beijing for a summer internship.
  • 2023.11:  🎉🎉 One paper is accepted by TCSVT 2023.
  • 2023.07:  🎉🎉 Release Awesome-Embodied-Robotics-and-Agent, a curated list of “Embodied robotics or agent with Vision-Language Models (VLMs) and Large Language Models (LLMs)” research! GitHub Repo stars

📝 Publications

* indicates equal contribution

  • OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction. ACL 2025.
    Haonan Zhang, Run Luo, Xiong Liu, Yuchuan Wu, Ting-En Lin, Pengpeng Zeng, QIANG QU, Feiteng Fang, Min Yang, Lianli Gao, Jingkuan Song, Fei Huang, Yongbin Li

  • MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct. ACL 2025(Findings). [Project Page] [Paper] [Code]
    Run Luo*, Haonan Zhang*, Longze Chen*, Ting-En Lin*, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li

  • Text-Video Retrieval with Global-Local Semantic Consistent Learning. TIP 2025. [Paper] [Code]
    Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Heng Tao Shen

  • OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis. arXiv 2025.01. [Paper] [Code]
    Run Luo, Ting-En Lin, Haonan Zhang, Yuchuan Wu, Xiong Liu, Min Yang, Yongbin Li, Longze Chen, Jiaming Li, Lei Zhang, Yangyi Chen, Hamid Alinejad-Rokny, Fei Huang

  • UMP: Unified Modality-aware Prompt Tuning for Text-Video Retrieval. TCSVT 2024. [Paper] [Code]
    Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen

  • MPT: Multi-grained Prompt Tuning for Text-Video Retrieval. ACM Multimedia 2024. [Paper] [Code]
    Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen

  • SPT: Spatial pyramid transformer for image captioning. TCSVT 2023. [Paper] [Code]
    Haonan Zhang, Pengpeng Zeng, Lianli Gao, Xinyu Lyu, Jingkuan Song, Heng Tao Shen

  • Depth-aware sparse transformer for video-language learning. ACM Multimedia 2023. [Paper] [Code]
    Haonan Zhang, Lianli Gao, Pengpeng Zeng, Alan Hanjalic, Heng Tao Shen

  • Learning visual question answering on controlled semantic noisy labels. PR 2023. [Paper] [Code]
    Haonan Zhang, Pengpeng Zeng, Yuxuan Hu, Jin Qian, Jingkuan Song, Lianli Gao

  • Video Question Answering with Prior Knowledge and Object-sensitive Learning. TIP 2022. [Paper] [Code]
    Pengpeng Zeng, Haonan Zhang, Lianli Gao, Jingkuan Song, Heng Tao Shen

  • S2 Transformer for image captioning. IJCAI 2022. [Paper] [Code]
    Pengpeng Zeng*, Haonan Zhang*, Jingkuan Song, and Lianli Gao

🎖 Honors and Scholarships

  • 2025.03 “Academic Newcomer” Graduate Student Honor Award.
  • 2024.12 National Scholarship.
  • 2024.07 The Ninth-Place Winner in Challenge of Black-box Adversarial Attacks on Vision Foundation Models in CVPR 2024.
  • 2024.05 Shenzhen Stock Exchange Scholarship.
  • 2024.04 The 1st Place Winner in Attribute Recognition track in MMVRAC, ICME 2024.
  • 2023.11 First-class Scholarship.
  • 2022.07 Outstanding Graduate Student Cadre.
  • 2023.06 Outstanding Graduate Teaching Assistant Award.
  • 2022.06 “Academic Youth” Graduate Student Honor Award.

📖 Educations

  • 2022.09 - (now), University of Electronic Science and Technology of China, Chengdu, China, Ph.D. student of Computer Science and Technology.
  • 2020.09 - 2022.06, University of Electronic Science and Technology of China, Chengdu, China, Master of Computer Technology, transferred to Ph.D.
  • 2016.09 - 2020.06, Xidian University, Xian, China, Bachelor of Computer Science and Technology.

💻 Internships

  • 2024.05 - 2025.04, Tongyi Lab, Alibaba Group, China.

💬 Services

Program Committee of AAAI25, Reviewer for CVPR23-25, ICCV25, ICLR25, ACM MM, etc.

Reviewer for TIP, TCSVT, TMM, etc.

Free Web Counters visitors since Apr. 2025