Haoye Dong
Carnegie Mellon University
🎓 Google Scholar | 💾 Demo videos | 🔗 Linkedin
I am a Postdoctoral Fellow at the Robotics Institute of Carnegie Mellon University, working with Prof. Fernando de la Torre and Dr. Dong Huang, from Sept. 2022. I received my Ph.D. degree from Sun Yat-sen University, advised by Prof. Jian Yin and Prof. Xiaodan Liang.
My current research interests mainly focus on Human-centric Generative AI
3D Human Reconstruction
2D/3D Virtual Try-on Networks
Personalized Human Generation
Video Generation and Understanding
Neural Representations and Rendering for Human
Recent News
07/2024 One paper accepted to ACM MM 2024: DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models
07/2024 One paper accepted to ECCV 2024: Generalizable Human Guassians
Research Statement
Research Timeline
Overview of my past research. Including controllable 2D/3D human image generation, realistic human try-on video synthesis, and accurate 3D human motion/generation using robust regressor/neural rendering.
Future Research Plan
The big picture of my future research plan. Firstly, building large human models for accurate and robust 3D humans based on a single image. Secondly, understanding and generating 3D humans in the scene. Lastly, leveraging Large Language Models (LLMs) and Large Vision Models (LVMs) to build Human-centric Artificial General Intelligence (HAGI).
Latest Projects
DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models
Key idea:
We propose a novel customizing 3D human try-on model, named DreamVTON, to separately optimize the geometry and texture of the 3D human. A personalized SD with multi-concept LoRA is proposed to provide the generative prior about the specific person and clothes. DreamVTON introduces a template-based optimization mechanism, which employs mask templates for geometry shape learning and normal/RGB templates for geometry/texture details learning.
Universal Monocular 3D Human Recovery Engine
Key idea:
We present a universal software engine for real-time 3D human perception in moving robots, stationary monitoring, and sports training. Our perception engine only uses one monocular RGB camera, produces accurate 3D human meshes in physical sizes and 3D translations, and enables real-time deployment in both moving and stationary platforms. A live public demo based on the engine is installed in the NSH building 3rd floor on the Carnegie Mellon University campus, Pittsburgh, PA.
https://delightcmu.github.io/Hello3D/
Physical-space Multi-body Mesh Detection Achieved by Local Alignment and Global Dense Learning
Key idea:
We introduce Physical-space Multi-body Mesh Detection, in which (1) Locally, we preserve the body aspect ratio, align the body-to-RoI layout, and densely refine the person-wise RoI features for robustness; (2) Globally, we learn dense-depth-guided features to amend the body-wise local feature for physical depth estimation.
WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on
Key idea:
we propose WarpDiffusion, which bridges the warping-based and diffusion-based paradigms via a novel informative and local garment feature attention mechanism. Specifically, WarpDiffusion incorporates local texture attention to reduce resource consumption and uses a novel auto-mask module that effectively retains only the critical areas of the warped garment while disregarding unrealistic or erroneous portions. Notably, WarpDiffusion can be integrated as a plug-and-play component into existing VITON methodologies, elevating their synthesis quality.
Selected Publications
- Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models. PDF, Project.
Haoye Dong*, Aviral Chharia*, Wenbo Gou*, Francisco Vicente Carrasco, Fernando De la Torre.Under review, 2024.
Haoyuan Li*, Haoye Dong*, Hanchao Jia, Dong Huang, Michael C. Kampffmeyer, Liang Lin, Xiaodan Liang.Proceedings of International Conference on Computer Vision (ICCV), 2023:7744--8753
- Fashion Editing with Adversarial Parsing Learning. PDF, CODE & DATSET
Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bochao Wang, Hanjiang Lai, Jia Zhu, Zhiting Hu, Jian Yin. Proceedings of International Conference on Computer Vision (ICCV), 2019:9026--9035.
- FW-GAN: Flow-navigated Warping GAN for Video Virtual Try-on. PDF, CODE & DATASET
Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin. Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2018: 472--482.
Haoye Dong, Jun Liu, Dong Huang.Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2024.
- Physical-space Multi-body Mesh Detection Achieved by Local Alignment and Global Dense Learning. PDF, CODE
- GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning. PDF, CODE
Hongyu Liu, Xintong Han, Chenbin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu and Qifeng Chen.Proceedings of International Conference on Learning Representations (ICLR), 2023.
- XFormer: Fast and Accurate Monocular 3D Body Capture. PDF
Zhenyu Xie, Zaiyu Huang, Fuwei Zhao, Haoye Dong, Michael Kampffmeyer, Xiaodan Liang.Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2021.
Fuwei Zhao, Zhenyu Xie, Michael Kampffmeyer, Haoye Dong, Songfang Han, Tianxiang Zheng, Tao Zhang, Xiaodan Liang.Proceedings of International Conference on Computer Vision (ICCV), 2021:13239--13249
- Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis. PDF
Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang. ACM Multimedia 2021 (ACM MM):3350--3359
Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric Xing. Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2018: 10501--10512.
Mentoring
Wenbo Gou (2023-Now): 3D Human Reconstruction
Master at CMU
Aviral Chharia (2023-Now): 3D Hand Reconstruction
Master at CMU
Zhenyu Xie (2019-Now): GP-VTON(CVPR23), WAS-VTON(ACM MM21)
Ph.D. at Sun Yat-sen University, visiting Stu. at CMU
Haoyuan Li (2022-Now): Coordinate Transformer (ICCV23, published during the undergraduate)
Master at Sun Yat-sen University
Xujie Zhang (2020-Now): Fashion Editing(CVPR20), WarpDiffusion
Ph.D. at Sun Yat-sen University, Research Intern at ByteDance.
Fuwei Zhao (2019-2022): M3D-VTON(ICCV21)
Researcher at ByteDance.
Very fortunate to meet you all. Welcome more motivated friends to collaborate together.
Academic Services
Organizer for CVPR 2020 Workshop on Human-centric Image/Video Synthesis. https://vuhcs.github.io
Organizer for CVPR 2019 Workshop on Augmented Human: Human-centric Understanding. https://vuhcs.github.io/vuhcs-2019/index.html
Reviewer for NeurIPS, CVPR, ICCV, ECCV, ICML, ICLR, WACV, ACM MM.
donghaoye12 at gmail.com | wechat: humanmodeling
© 2024 Haoye Dong