Haoye Dong

Sun Yat-sen University

🎓 Google Scholar | 💾 Demo videos | 🔗 Linkedin

I am currently a tenure-track Associate Professor at Sun Yat-sen University.
I was a Senior Research Fellow at the National University of Singapore(2024-2026), working with Prof. Gim Hee Lee. I was a Postdoctoral Fellow at the Robotics Institute of Carnegie Mellon University (2022-2024), working with Prof. Fernando de la Torre and Dr. Dong Huang. I received my Ph.D. degree from Sun Yat-sen University, advised by Prof. Jian Yin, Prof. Xiaodan Liang, and Prof. Liang Lin.

My research interests mainly focus on
Human-centered Embodied AI and World Models

Humanoid WBC (人形机器人)
3D Human&Scene Reconstruction
Video Generation and Understanding
2D/3D Virtual Try-on Networks

🙋 Looking for Fall 2026/2027 Master and Fall 2027 PhD students.

donghaoye12@gmail.com

National University of Singapore
(NUS)

Carnegie Mellon University
(CMU)

Sun Yat-sen University
(SYSU)

Recent News

09/2025 Serving as Area Chair (AC) for CVPR 2026.

08/2025 Serving as Area Chair (AC) for ICLR 2026.

06/2025 Three papers accepted to ICCV 2025.

PS-Mamba: Spatial-Temporal Graph Mamba for Pose Sequence Refinement
DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting

03/2025 Two papers accepted to CVPR 2025.

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Learnable Infinite Taylor Gaussian for Dynamic View Rendering

12/2024 Workshop Proposal accepted to CVPR 2025: Visual Modeling Challenges for 2D-3D Virtual Try-On, https://vto-at-cvpr25.github.io

09/2024 One paper accepted to NeurIPS 2024: Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba

07/2024 One paper accepted to ACM MM 2024: DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models

07/2024 One paper accepted to ECCV 2024: Generalizable Human Guassians

Research Statement

Research Timeline

Overview of my past research. Including controllable 2D/3D human image generation, realistic human try-on video synthesis, and accurate 3D human motion/generation using robust regressor/neural rendering.

Future Research Plan

The big picture of my future research plan. Firstly, building large human models for accurate and robust 3D humans based on a single image. Secondly, understanding and generating 3D humans in the scene. Lastly, leveraging Large Language Models (LLMs) and Large Vision Models (LVMs) to build Human-centered Artificial General Intelligence (HAGI).

Selected Publications

PS-Mamba: Spatial-Temporal Graph Mamba for Pose Sequence Refinement. Web, Code

Haoye Dong, Gim Hee Lee.ICCV 2025.

CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting. Web

Siyu Jiao, Haoye Dong, Yuyang Yin, ZEQUN JIE, Yinlong Qian, Yao Zhao, Humphrey Shi, Yunchao Wei.ICCV 2025.

DuCos: Duality Constrained Depth Super-Resolution via Foundation Model. Web, Code

Zhiqiang Yan, Zhengxue Wang, Haoye Dong, Jun Li, Jian Yang, Gim Hee Lee.ICCV 2025.

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation. Project, Code

Aviral Chharia, Wenbo Gou, Haoye Dong*.CVPR 2025.

Learnable Infinite Taylor Gaussian for Dynamic View Rendering. Project, CODE

Bingbing Hu, Yanyan Li, Rui Xie, Bo Xu, Haoye Dong, Junfeng Yao, Gim Hee Lee.CVPR 2025.

Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba. PDF, Video, Project, CODE

Haoye Dong*, Aviral Chharia*, Wenbo Gou*, Francisco Vicente Carrasco, Fernando De la Torre.NeurIPS 2024.

Universal Monocular 3D Human Recovery Engine. Project, Video

Haoye Dong, Jun Liu, Dong Huang.Proceedings of IEEE International Conference on Robotics and Automation (ICRA Video Track), 2024.

Generalizable Human Gaussians for Sparse View Synthesis. Project, CODE

Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella-Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, Aayush Prakash, and Fernando De la Torre.ECCV 2024.

Physical-space Multi-body Mesh Detection Achieved by Local Alignment and Global Dense Learning. PDF

Haoye Dong, Tiange Xiang, Sravan Chittupalli, Jun Liu, Dong Huang.Proceedings of Winter Conference on Applications of Computer Vision (WACV), 2024:1267--1276.

DF-VTON: Dense Flow Guided Virtual Try-On Network. PDF

Haoye Dong, Jun Liu, Dong Huang.IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024.

DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models. PDF

Zhenyu Xie, Haoye Dong, Yufei Gao, Zehua Ma, Xiaodan Liang.Proceedings of 32nd ACM Multimedia Conference (ACM MM), 2024.

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos. PDF, CODE

Haoyuan Li*, Haoye Dong*, Hanchao Jia, Dong Huang, Michael C. Kampffmeyer, Liang Lin, Xiaodan Liang.Proceedings of International Conference on Computer Vision (ICCV), 2023:7744--8753

Fashion Editing with Adversarial Parsing Learning. PDF, CODE & DATSET

Haoye Dong, Xiaodan Liang, Xiaohui Shen, Zhenyu Xie, Jian Yin, et al.. Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), 2020: 8120--8128.

Towards Multi-pose Guided Virtual Try-on Network. PDF, CODE & DATASET

Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bochao Wang, Hanjiang Lai, Jia Zhu, Zhiting Hu, Jian Yin. Proceedings of International Conference on Computer Vision (ICCV), 2019:9026--9035.

FW-GAN: Flow-navigated Warping GAN for Video Virtual Try-on. PDF, CODE & DATASET

Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-Cheng Chen, Jian Yin. Proceedings of International Conference on Computer Vision (ICCV), 2019:1161--1170.

Part-Preserving Pose Manipulation for Person Image Synthesis. PDF

Haoye Dong, Xiaodan Liang, Chenxing Zhou, Hanjiang Lai, Jia Zhu, Jian Yin:. Proceedings of International Conference on Multimedia and Expo (ICME), 2019:1234-1239.

Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis. PDF, CODE & DATASET

Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin. Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2018: 472--482.

GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning. PDF, CODE

Zhenyu Xie, Zaiyu Huang, Xin Dong, Fuwei Zhao, Haoye Dong, Xijin Zhang, Feida Zhu, Xiaodan Liang.Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

Human MotionFormer: Transferring Human Motions with Vision Transformers. PDF, CODE

Hongyu Liu, Xintong Han, Chenbin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu and Qifeng Chen.Proceedings of International Conference on Learning Representations (ICLR), 2023.

XFormer: Fast and Accurate Monocular 3D Body Capture. PDF

Lihui Qian, Xintong Han, Faqiang Wang, Hongyu Liu, Haoye Dong, Zhiwen Li, Huawei Wei, Zhe Lin and Chengbin Jin.Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2023.

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN. PDF, CODE

Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xiaodan Liang.Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2021.

M3D-VTON: A Monocular-to-3D Virtual Try-On Network. PDF, CODE

Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, Xiaodan Liang.Proceedings of International Conference on Computer Vision (ICCV), 2021:13239--13249

Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis. PDF

Bowen Wu, Zhenyu Xie, Xiaodan Liang, Yubei Xiao, Haoye Dong, Liang Lin.IEEE Transactions on Image Processing (TIP). 30: 9259--9269 (2021)

WAS-VTON: Warping Architecture Search for Virtual Try-on Network. PDF, CODE

Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang. ACM Multimedia 2021 (ACM MM):3350--3359

Deep Generative Models with Learnable Knowledge Constraints. PDF, SLIDES, POSTER

Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric Xing. Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2018: 10501--10512.

Mentoring

Wenbo Gou (2023-Now): 3D Human Reconstruction
- Master at CMU
Aviral Chharia (2023-Now): 3D Hand Reconstruction
- Master at CMU
Zhenyu Xie (2019-Now): GP-VTON(CVPR23), WAS-VTON(ACM MM21)
- Ph.D. at Sun Yat-sen University, visiting Stu. at CMU
Haoyuan Li (2022-Now): Coordinate Transformer (ICCV23, published during the undergraduate)
- Master at Sun Yat-sen University
Xujie Zhang (2020-Now): Fashion Editing(CVPR20), WarpDiffusion
- Ph.D. at Sun Yat-sen University, Research Intern at ByteDance.
Fuwei Zhao (2019-2022): M3D-VTON(ICCV21)
- Researcher at ByteDance.

Very fortunate to meet you all. Welcome more motivated friends to collaborate together.

Academic Services

Organizer for CVPR 2025 Workshop on Visual Modeling Challenges for 2D-3D Virtual Try-On, https://vto-at-cvpr25.github.io
Organizer for CVPR 2020 Workshop on Human-centric Image/Video Synthesis. https://vuhcs.github.io
Organizer for CVPR 2019 Workshop on Augmented Human: Human-centric Understanding. https://vuhcs.github.io/vuhcs-2019/index.html
Reviewer for NeurIPS, CVPR, ICCV, ECCV, ICML, ICLR etc.

donghaoye12 at gmail.com | wechat: humanmodeling