Back to All Events

Prof. Yue Wang (USC)

  • Stanford University Nvidia Auditorium Stanford, CA 94305 USA (map)

Title: 𝚿0: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation


Speaker: Prof. Yue Wang (USC)

Location: Nvidia Auditorium

Attendance Linkhttps://tinyurl.com/robosem-win-26

Time: Friday Feb 20th, 3:00-4:00PM

Abstract: In this talk, I am going to share our recent work 𝚿0, an open foundation model to address challenging humanoid loco-manipulation tasks. While existing approaches often attempt to address this fundamental problem by co-training on large and diverse human and humanoid data, we argue that this strategy is suboptimal due to the fundamental kinematic and motion disparities between humans and humanoid robots. Therefore, data efficiency and model performance remain unsatisfactory despite the considerable data volume. To address this challenge, our method decouples the learning process to maximize the utility of heterogeneous data sources. Specifically, we propose a staged training paradigm with different learning objectives: First, we autoregressively pre-train a VLM backbone on large-scale egocentric human videos to acquire generalizable visual-action representations. Then, we post-train a flow-based action expert on high-quality humanoid robot data to learn precise robot joint control. Our research further identifies a critical yet often overlooked data recipe: in contrast to approaches that scale with noisy Internet clips or heterogeneous cross-embodiment robot datasets, we demonstrate that pre-training on high-quality egocentric human manipulation data followed by post-training on domain-specific real-world humanoid trajectories yields superior performance. Extensive real-world experiments demonstrate that 𝚿0 achieves the best performance using only about 800 hours of human video data and 30 hours of real-world robot data, outperforming baselines pre-trained on more than 10 times as much data by over 40% in overall success rate across multiple tasks. We will open-source the entire ecosystem to the community, including a data processing and training pipeline, a humanoid foundation model, and a real-time action inference engine.

Bio: Yue Wang is an assistant professor at the University of Southern California (https://yuewang.xyz), leading the USC Physical Superintelligence Lab (https://psi-lab.ai). His research is dedicated to addressing physical superintelligence with state-of-the-art 3D computer vision and robotic algorithms. His lab is focusing on three major directions: 1) neural scene representations for robotics and autonomous driving; 2) generative data synthesis for robotics, 3) and dexterous manipulation in open environments. He worked on 3D geometric deep learning during his PhD. His paper "Dynamic Graph CNN" is the most cited paper in ACM TOG. He received the Powell Faculty Research Award, Toyota Young Faculty Researcher award, Nvidia Fellowship, the best paper award in geometric computing and graphics at the inaugural international congress of basic science, the best paper nomination at the CVPR 2021 workshop on autonomous driving, and the best opensource award at IROS RoboGen workshop. He was also named the first place recipient of the William A. Martin Master’s Thesis Award for 2021 at MIT. His students were widely recognized by Nvidia Fellowship, Qualcomm Fellowship, Amazon Fellowship, and Capital One Fellowship. Yue received his bachelor’s degree from Zhejiang University, his master degree from UC San Diego, and his PhD from MIT. 

Please visit https://stanfordasl.github.io/robotics_seminar/ for this quarter’s lineup of speakers. Although we encourage live in-person attendance, recordings of talks will be posted also.

If you’re interested, you’re welcome to join Yue for lunch at Blend Cafe at 12 PM. Please let Dian (dianwang@stanford.edu) know if you plan to join.

Previous
Previous
February 13

Prof. Tapomayukh "Tapo" Bhattacharjee (Cornell)

Next
Next
February 27

Prof. Max Simchowitz (CMU)