[Paper Review] NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
NIFTY proposes an object-conditioned diffusion model for realistic 3D human motion interacting with objects, guided by a learned object interaction field and trained via an automated synthetic data pipeline. It achieves improved motion quality and feasible object interactions for sitting and lifting tasks.
We address the problem of generating realistic 3D motions of humans interacting with objects in a scene. Our key idea is to create a neural interaction field attached to a specific object, which outputs the distance to the valid interaction manifold given a human pose as input. This interaction field guides the sampling of an object-conditioned human motion diffusion model, so as to encourage plausible contacts and affordance semantics. To support interactions with scarcely available data, we propose an automated synthetic data pipeline. For this, we seed a pre-trained motion model, which has priors for the basics of human movement, with interaction-specific anchor poses extracted from limited motion capture data. Using our guided diffusion model trained on generated synthetic data, we synthesize realistic motions for sitting and lifting with several objects, outperforming alternative approaches in terms of motion quality and successful action completion. We call our framework NIFTY: Neural Interaction Fields for Trajectory sYnthesis.
Motivation & Objective
- Motivate realistic 3D human motions involving object interactions beyond scene-agnostic motion
Proposed method
- Extend a human motion diffusion model to condition on object geometry
- Introduce an object-centric interaction field that outputs pose manifold distances for guidance
- Use classifier-free guidance to improve sampling quality
- Train the interaction field with data generated from a synthetic pipeline seeded by anchor poses
- Automatically synthesize large-scale interaction motion data from a small anchor-pose set using a time-reversed pre-trained motion model
- Evaluate sitting and lifting interactions across multiple objects with quantitative and perceptual metrics
Experimental results
Research questions
- RQ1How can we generate realistic human-object interaction motions conditioned on object geometry?
- RQ2Can a learned interaction field guide diffusion sampling to reduce penetration and improve contact realism?
- RQ3Does synthetic data generation from anchor poses suffice to train effective diffusion and interaction-field models for multiple objects?
- RQ4How do sitting and lifting interactions compare across diverse objects using the NIFTY framework?
Key findings
- NIFTY outperforms baselines in terms of reaching the object with low penetration and realistic contacts.
- Guided diffusion with the learned interaction field yields higher user preferences in perceptual studies (88-97% against baselines).
- The approach achieves low skeleton distance and high contact IoU with synthetic interaction data.
- The ablation shows offset-vector interaction fields perform better than scalar-distance fields.
- A synthetic data pipeline seeded with anchor poses can generate large, diverse interaction datasets suitable for training.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.