Di Huang

AI Researcher.


I am a researcher at the Shanghai AI Laboratory , where I collaborate with a team on advanced AI research. Concurrently, I am a Ph.D. student at The University of Sydney, working under the guidance of Wanli Ouyang on AI topics. I previously earned my B.E. degree from Zhejiang University and had the privilege of graduating with honors from Chu Kochen Honors College. During my time there, I had the valuable experience of working with Xiaowei Zhou at the ZJU-3DV Lab, whom I also regarded as another advisor of mine.

My research interests focus on building a capable super-intelligent system that can solve real-world perception, reasoning, understanding, and action problems, essentially pursuing AGI(Artificial General Intelligence). Given the difficulty of this goal, my current research primarily addresses the spatial intelligence: Enabling a general intelligence system to fully understand the 3D world (together with Tong He). I am still in the early stages of this long-term pursuit. Due to the divergence in my research interests, I also explore several directions out of curiosity :baby: and enjoyment :yum:. More details can be found below.

Prospective interns: We are looking for self-motivated research interns. If you are interested in working with us, please drop an email. We highly value strong motivation and coding ability.

Email  /  Twitter  /  Github


May 11, 2024 FIT was accepted to ICML 2024.
Mar 3, 2024 Talks of 2023 have been uploaded. Talks of 2024 will start soon.
Feb 27, 2024 One paper accepted to CVPR 2024.
Dec 4, 2023 New personal website launched.

Research Summary

Figure1 To empower foundation models with the abilities of percepting, understanding, and acting in the 3D real-world; Images are generated by OpenAI DALLE 3.

3D Foundation Model

  1. agent3d.png
    Agent3D-Zero: An Agent for Zero-shot 3D Understanding
    Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli OuyangTong He, and Yanyong Zhang
    arXiv preprint, 2024
  2. unipad.png
    UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
    Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, and  others
    CVPR, 2024
  3. ponderv2.png
    PonderV2: Pave the Way for 3D Foundataion Model with A Universal Pre-training Paradigm
    Haoyi Zhu*, Honghui Yang*, Xiaoyang Wu*, Di Huang*, Sha Zhang, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, and  others
    arXiv preprint, 2023
    (* indicates equal contribution)
  4. ponder.jpeg
    Ponder: Point cloud pre-training via neural rendering
    Di HuangSida PengTong He, Honghui Yang, Xiaowei Zhou, and Wanli Ouyang
    ICCV, 2023

Image/Video Generation

  1. FiT.png
    FiT: Flexible Vision Transformer for Diffusion Model
    Zeyu Lu*, Zidong Wang*, Di Huang, Chengyue Wu, Xihui LiuWanli Ouyang, and Lei Bai
    ICML, 2024
    (* indicates equal contribution)
  2. sentry crop.png
    Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
    Zeyu Lu*, Di Huang*Lei Bai*Xihui Liu, Jingjing Qu, and Wanli Ouyang
    NeurIPS dataset and benchmark track, 2023
    (* indicates equal contribution)

Human Foundation Model

  1. motiongpt.png
    MotionGPT: Finetuned LLMs are General-Purpose Motion Generators
    Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, and Wanli Ouyang
    AAAI, 2024


Tools of our research group: GitHub Link.