ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation



Published on


We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. With TDW, users can simulate high-fidelity sensory data and physical interactions between mobile agents and objects in a wide variety of rich 3D environments. TDW has several unique properties: 1) realtime near photo-realistic image rendering quality; 2) a library of objects and environments with materials for high-quality rendering, and routines enabling user customization of the asset library; 3) generative procedures for efficiently building classes of new environments 4) high-fidelity audio rendering; 5) believable and realistic physical interactions for a wide variety of material types, including cloths, liquid, and deformable objects; 6) a range of “avatar” types that serve as embodiments of AI agents, with the option for user avatar customization; and 7) support for human interactions with VR devices. TDW also provides a rich API enabling multiple agents to interact within a simulation and return a range of sensor and physics data representing the state of the world. We present initial experiments enabled by the platform around emerging research directions in computer vision, machine learning, and cognitive science, including multi-modal physical scene understanding, multi-agent interactions, models that “learn like a child”, and attention studies in humans and neural networks. The simulation platform will be made publicly available.

Please cite our work using the BibTeX below.

      title={ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation}, 
      author={Chuang Gan and Jeremy Schwartz and Seth Alter and Martin Schrimpf and James Traer and Julian De Freitas and Jonas Kubilius and Abhishek Bhandwaldar and Nick Haber and Megumi Sano and Kuno Kim and Elias Wang and Damian Mrowca and Michael Lingelbach and Aidan Curtis and Kevin Feigelis and Daniel M. Bear and Dan Gutfreund and David Cox and James J. DiCarlo and Josh McDermott and Joshua B. Tenenbaum and Daniel L. K. Yamins},

This paper has been published at NeurIPS 2021 Dataset and Benchmarks track 
Close Modal