Antonio Torralba

Delta Electronics Professor of Electrical Engineering and Computer Science; Head, Faculty of AI and Decision-making, MIT EECS, MIT Schwarzman College of Computing

Antonio Torralba is the Delta Electronics Professor of Electrical Engineering and Computer Science at MIT and an investigator at the Computer Science and Artificial Intelligence Laboratory. He also heads the faculty of artificial intelligence and decision-making in the MIT Schwarzman College of Computing. Previously, he led the MIT Quest for Intelligence as its inaugural director, and was the MIT director of the MIT–IBM Watson AI Lab. Torralba researches computer vision, machine learning, and human visual perception, with an interest in building systems that can perceive the world the way humans do. He has received an NSF Career award, the International Association for Pattern Recognition’s JK Aggarwal Prize, a Frank Quick Faculty Research Innovation Fellowship and a Louis D. Smullin (’39) Award for Teaching Excellence. Torralba earned a BS from Telecom BCN, Spain, and a PhD from the Institut National Polytechnique de Grenoble, France.

Selected Publications

Media

Top Work

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Generative Models

Publications with the MIT-IBM Watson AI Lab

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
 
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
 
Learning Neural Acoustic Fields
Learning Neural Acoustic Fields
 
Procedural Image Programs for Representation Learning
Procedural Image Programs for Representation Learning
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
 
Virtual Correspondence: Humans as a Cue for Extreme-View Geometry
Virtual Correspondence: Humans as a Cue for Extreme-View Geometry
 
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
Finding Fallen Objects Via Asynchronous Audio-Visual Integration
 
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
 
Disentangling Visual and Written Concepts in CLIP
Disentangling Visual and Written Concepts in CLIP
 
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
ComPhy: Compositional Physical Reasoning of Objects and Events from Videos
 
Natural Language Descriptions of Deep Visual Features
Natural Language Descriptions of Deep Visual Features
 
Measuring Generalization with Optimal Transport
Measuring Generalization with Optimal Transport
 
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
 
Debiased Contrastive Learning
Debiased Contrastive Learning
 
Foley Music: Learning to Generate Music from Videos
Foley Music: Learning to Generate Music from Videos
 
CLEVRER: The first video dataset for neuro-symbolic reasoning
CLEVRER: The first video dataset for neuro-symbolic reasoning
 
Experiences and Insights for Collaborative Industry-Academic Research in Artificial Intelligence
Experiences and Insights for Collaborative Industry-Academic Research in Artificial Intelligence
 
Deep Audio Priors Emerge From Harmonic Convolutional Networks
Deep Audio Priors Emerge From Harmonic Convolutional Networks
 
The sound of motions
The sound of motions
 
Self-supervised Moving Vehicle Tracking with Stereo Sound
Self-supervised Moving Vehicle Tracking with Stereo Sound
 
Seeing What a GAN Cannot Generate
Seeing What a GAN Cannot Generate
 
Grounding Spoken Words in Unlabeled Video
Grounding Spoken Words in Unlabeled Video
 
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
 
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
 
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
 
Interpretable Basis Decomposition for Visual Explanation
Interpretable Basis Decomposition for Visual Explanation