Di Huang

AI Researcher.

dihuang.jpeg

I am a researcher at the Shanghai AI Laboratory , where I collaborate with a team on advanced AI research. Concurrently, I am a Ph.D. student at The University of Sydney, working under the guidance of Wanli Ouyang on AI topics. I previously earned my B.E. degree from Zhejiang University and had the privilege of graduating with honors from Chu Kochen Honors College. During my time there, I had the valuable experience of working with Xiaowei Zhou at the ZJU-3DV Lab, whom I also regarded as another advisor of mine.

My research interests focus on building a capable super-intelligent system that can solve real-world perception, reasoning, understanding, and action problems, essentially pursuing AGI(Artificial General Intelligence). Given the difficulty of this goal, my current research primarily addresses the spatial intelligence: Enabling a general intelligence system to fully understand the 3D world (together with Tong He). I am still in the early stages of this long-term pursuit. Due to the divergence in my research interests, I also explore several directions out of curiosity :baby: and enjoyment :yum:. More details can be found below.

Prospective interns: We are looking for self-motivated research interns. If you are interested in working with us, please drop an email. We highly value strong motivation and coding ability.

Email  /  Scholar  /  Twitter  /  Github


News

Oct 21, 2024 Huggingface Demo of Depth Any Video is available now.
Oct 15, 2024 Depth Any Video is released on arXiv.
Sep 27, 2024 Two submissions (NeuRodin, Point Cloud Matters) were accepted by Neurips 2024.
Jul 2, 2024 All three submissions (Agent3D-Zero, GVGEN, PredBench) were accepted by ECCV 2024.
Jun 13, 2024 FIT was selected as a **spotlight-designated paper** at ICML 2024.

Research Summary

Figure1 To empower foundation models with the abilities of percepting, understanding, and acting in the 3D real-world; Images are generated by OpenAI DALLE 3.

3D Foundation Model

  1. depthanyvideo.png
    arXiv
    Depth Any Video with Scalable Synthetic Data
    Honghui Yang*, Di Huang*, Wei Yin, Chunhua Shen, Haifeng Liu, Xiaofei He, Binbin Lin, Wanli Ouyang, and Tong He
    arXiv preprint, 2024
    (* indicates equal contribution)
  2. neurodin.png
    Neurips
    NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction
    Yifan Wang, Di Huang, Weicai Ye, Guofeng Zhang, Wanli Ouyang, and Tong He
    Neurips, 2024
  3. agent3d.png
    ECCV
    Agent3D-Zero: An Agent for Zero-shot 3D Understanding
    Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli OuyangTong He, and Yanyong Zhang
    ECCV, 2024
  4. unipad.png
    CVPR
    UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
    Honghui Yang, Sha Zhang, Di Huang, Xiaoyang Wu, Haoyi Zhu, Tong He, Shixiang Tang, Hengshuang Zhao, Qibo Qiu, Binbin Lin, and  others
    CVPR, 2024
  5. ponderv2.png
    arXiv
    PonderV2: Pave the Way for 3D Foundataion Model with A Universal Pre-training Paradigm
    Haoyi Zhu*, Honghui Yang*, Xiaoyang Wu*, Di Huang*, Sha Zhang, Xianglong He, Tong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, and  others
    arXiv preprint, 2023
    (* indicates equal contribution)
  6. ponder.jpeg
    ICCV
    Ponder: Point cloud pre-training via neural rendering
    Di HuangSida PengTong He, Honghui Yang, Xiaowei Zhou, and Wanli Ouyang
    ICCV, 2023

Image/Video/3D Generation

  1. meshanything.png
    arXiv
    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
    Yiwen Chen, Tong HeDi Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, and Chi Zhang
    arXiv preprint, 2024
  2. gvgen.png
    ECCV
    GVGEN:Text-to-3D Generation with Volumetric Representation
    Xianglong He*, Junyi Chen*, Sida PengDi Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, and Tong He
    ECCV, 2024
    (* indicates equal contribution)
  3. FiT.png
    ICML
    FiT: Flexible Vision Transformer for Diffusion Model
    Zeyu Lu*, Zidong Wang*, Di Huang, Chengyue Wu, Xihui LiuWanli Ouyang, and Lei Bai
    ICML, 2024
    (* indicates equal contribution)
  4. sentry crop.png
    NeurIPS
    Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
    Zeyu Lu*, Di Huang*Lei Bai*Xihui Liu, Jingjing Qu, and Wanli Ouyang
    NeurIPS dataset and benchmark track, 2023
    (* indicates equal contribution)

Human Foundation Model

  1. holistic.png
    arXiv
    Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
    Yuan Wang, Zhao Wang, Junhao Gong, Di HuangTong HeWanli Ouyang, Jile Jiao, Xuetao Feng, Qi Dou, Shixiang Tang, and Dan Xu
    arXiv preprint, 2024
  2. motiongpt.png
    AAAI
    MotionGPT: Finetuned LLMs are General-Purpose Motion Generators
    Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, and Wanli Ouyang
    AAAI, 2024

Others

Tools of our research group: GitHub Link.

Conferece Reviewer for CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, 3DV, ICRA.