Zhen Wu 「吴臻」
|
Email |
Google Scholar |
|
LinkedIn |
CV |
|
I am an Applied Scientist Intern at Amazon FAR (Frontier AI & Robotics), working with Carmelo Sferrazza, C. Karen Liu, Angjoo Kanazawa, Guanya Shi, Rocky Duan, and Pieter Abbeel.
Prior to that, I earned my Master's degree from Stanford University, advised by C. Karen Liu.
I earned my Bachelor's degree from Peking University, working with Libin Liu and Baoquan Chen.
My research interests are in humanoid robotics and character animation. I am particularly interested in how humans and robots perceive and interact with the world, with a focus on achieving human-level agility and dexterity.
I'm always open to discussion and collaboration—feel free to drop me an email if you're interested.
Email: zhenwu [AT] stanford.edu
|
|
|
Perceptive Humanoid Parkour: Chaining Dynamic Human Skills via Motion Matching
Zhen Wu*, Xiaoyu Huang*, Lujie Yang*, Yuanhang Zhang, Koushil Sreenath, Xi Chen,
Pieter Abbeel†, Rocky Duan†, Angjoo Kanazawa†, Carmelo Sferrazza†, Guanya Shi†, C. Karen Liu†
In Submission
webpage |
interactive demo |
pdf |
abstract |
twitter
While recent advances in humanoid locomotion have achieved stable walking on varied terrains, capturing the agility and adaptivity of highly dynamic human motions remains an open challenge. In particular, agile parkour in complex environments demands not only low-level robustness, but also human-like motion expressiveness, long-horizon skill composition, and perception-driven decision-making. In this paper, we present Perceptive Humanoid Parkour (PHP), a modular framework that enables humanoid robots to autonomously perform long-horizon, vision-based parkour across challenging obstacle courses. Our approach first leverages motion matching, formulated as nearest-neighbor search in a feature space, to compose retargeted atomic human skills into long-horizon kinematic trajectories. This framework enables the flexible composition and smooth transition of complex skill chains while preserving the elegance and fluidity of dynamic human motions. Next, we train motion-tracking reinforcement learning (RL) expert policies for these composed motions, and distill them into a single depth-based, multi-skill student policy, using a combination of DAgger and RL. Crucially, the combination of perception and skill composition enables autonomous, context-aware decision-making: using only onboard depth sensing and a discrete 2D velocity command, the robot selects and executes whether to step over, climb onto, vault or roll off obstacles of varying geometries and heights. We validate our framework with extensive real-world experiments on a Unitree G1 humanoid robot, demonstrating highly dynamic parkour skills such as climbing tall obstacles up to 1.25m (96% robot height), as well as long-horizon multi-obstacle traversal with closed-loop adaptation to real-time obstacle perturbations.
|
|
|
OmniRetarget: Interaction-Preserving Data Generation for Humanoid
Whole-Body Loco-Manipulation and Scene Interaction
Lujie Yang*, Xiaoyu Huang*, Zhen Wu*, Angjoo Kanazawa†, Pieter Abbeel†, Carmelo Sferrazza†, C. Karen Liu†, Rocky Duan†, Guanya Shi†
ICRA 2026
webpage |
pdf |
abstract |
dataset |
code |
twitter
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction- preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 9-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long- horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
|
|
|
Learning to Ball: Composing Policies for Long-Horizon Basketball Moves
Pei Xu, Zhen Wu, Ruocheng Wang, Vishnu Sarukkai, Kayvon Fatahalian, Ioannis Karamouzas, Victor Zordan, C. Karen Liu
ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2025)
webpage |
pdf |
abstract |
code |
bilibili
Learning a control policy for a multi-phase, long-horizon task, such as basketball maneuvers, remains challenging for reinforcement learning approaches due to the need for seamless policy composition and transitions between skills. A long-horizon task typically consists of distinct subtasks with well-defined goals, separated by transitional subtasks with unclear goals but critical to the success of the entire task. Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states or lack well-defined initial and terminal states between different phases. In this paper, we introduce a novel policy integration framework to enable the composition of drastically different motor skills in multi-phase long-horizon tasks with ill-defined intermediate states. Based on that, we further introduce a high-level soft router to enable seamless and robust transitions between the subtasks. We evaluate our framework on a set of fundamental basketball skills and challenging transitions. Policies trained by our approach can effectively control the simulated character to interact with the ball and accomplish the long-horizon task specified by real-time user commands, without relying on ball trajectory references.
@article{basketball,
author = {Xu, Pei and Wu, Zhen and Wang, Ruocheng
and Sarukkai, Vishnu and Fatahalian, Kayvon
and Karamouzas, Ioannis and Zordan, Victor
and Liu, C. Karen},
title = {Learning to Ball: Composing Policies
for Long-Horizon Basketball Moves},
journal = {ACM Transactions on Graphics},
publisher = {ACM New York, NY, USA},
year = {2024},
volume = {44},
number = {6},
doi = {10.1145/3763367}
}
|
|
|
Human-Object Interaction from Human-Level Instructions
Zhen Wu, Jiaman Li, Pei Xu, C. Karen Liu
ICCV 2025
webpage |
pdf |
abstract |
code
Intelligent agents must autonomously interact with the environments to perform daily tasks based on human-level instructions. They need a foundational understanding of the world to accurately interpret these instructions, along with precise low-level movement and interaction skills to execute the derived actions. In this work, we propose the first complete system for synthesizing physically plausible, long-horizon human-object interactions for object manipulation in contextual environments, driven by human-level instructions. We leverage large language models (LLMs) to interpret the input instructions into detailed execution plans. Unlike prior work, our system is capable of generating detailed finger-object interactions, in seamless coordination with full-body movements. We also train a policy to track generated motions in physics simulation via reinforcement learning (RL) to ensure physical plausibility of the motion. Our experiments demonstrate the effectiveness of our system in synthesizing realistic interactions with diverse objects in complex environments, highlighting its potential for real-world applications.
@inproceedings{wu2025human,
title={Human-object interaction
from human-level instructions},
author={Wu, Zhen and Li, Jiaman
and Xu, Pei and Liu, C Karen},
booktitle={Proceedings of the IEEE/CVF
International Conference on Computer Vision},
pages={11176--11186},
year={2025}
}
|
|
|
Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors
Yuke Lou*, Yiming Wang*, Zhen Wu, Rui Zhao, Wenjia Wang, Mingyi Shi, Taku Komura
In Submission
webpage |
pdf |
abstract
Human-object interaction (HOI) synthesis is important for various applications, ranging from virtual reality to robotics. However, acquiring 3D HOI data is challenging due to its complexity and high cost, limiting existing methods to the narrow diversity of object types and interaction patterns in training datasets. This paper proposes a novel zero-shot HOI synthesis framework without relying on end-to-end training on currently limited 3D HOI datasets. The core idea of our method lies in leveraging extensive HOI knowledge from pre-trained Multimodal Models. Given a text description, our system first obtains temporally consistent 2D HOI image sequences using image or video generation models, which are then uplifted to 3D HOI milestones of human and object poses. We employ pre-trained human pose estimation models to extract human poses and introduce a generalizable category-level 6-DoF estimation method to obtain the object poses from 2D HOI images. Our estimation method is adaptive to various object templates obtained from text-to-3D models or online retrieval. A physics-based tracking of the 3D HOI kinematic milestone is further applied to refine both body motions and object poses, yielding more physically plausible HOI generation results. The experimental results demonstrate that our method is capable of generating open-vocabulary HOIs with physical realism and semantic diversity.
@article{lou2025zero,
title={Zero-shot human-object interaction
synthesis with multimodal priors},
author={Lou, Yuke and Wang, Yiming and Wu, Zhen
and Zhao, Rui and Wang, Wenjia and Shi, Mingyi
and Komura, Taku},
journal={arXiv preprint arXiv:2503.20118},
year={2025}
}
|
CS224R: Deep Reinforcement Learning - Spring 2025
CS229: Machine Learning - Winter 2025
CS248B: Fundamentals of Computer Graphics: Animation and Simulation - Fall 2025
|
|