» Multimodal Learning

Charting the future of AI, from safer answers to faster thinking

MIT News

↗

Method teaches generative AI models to locate personalized objects

MIT News

↗

AI-enabled control system helps autonomous drones stay on target in uncertain environments

MIT News

↗

AI learns how vision and sound are connected, without human intervention

MIT News

↗

IBM Granite now has eyes

IBM Research

↗

From surf to satellites: Campbell Watson is bringing AI to Earth science

IBM Research

↗

Participatory AI highlights paths to sustainability

MIT ILP

↗

MIT researchers advance automated interpretability in AI models

MIT News

↗

Multiple AI models help robots execute complex plans more transparently

MIT News

↗

Using language to give robots a better grasp of an open-ended world

MIT News

↗

Scaling audio-visual learning without labels

MIT News

↗

Some glimpse AGI in ChatGPT. Others call it a mirage

WIRED

↗

This AI can harness sound to reveal the structure of unseen spaces

Popular Science

↗

Perceptron: AI that sees with sound, learns to walk and predicts seismic physics

TechCrunch

↗

Using sound to model the world

MIT News

↗

Daniel Huttenlocher

MIT ILP

↗

Converting several audio streams into one voice makes it easier for AI to learn

IBM Research

↗

More Language, Less Labeling with Kate Saenko

This Week in Machine Learning & AI (TWIML) podcast

↗

Hallucinating to better text translation

MIT News

↗

Artificial intelligence system learns concepts shared across video, audio, and text

MIT News

↗

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

Computer Vision Multimodal Learning

Neural-Network Can Identify a Melody Through Musicians’ Body Movements

Interesting Engineering

↗

Identifying a melody by studying a musician’s body language

MIT News

↗

Undergraduates develop next-generation intelligence tools

MIT News

↗

Self-supervised Moving Vehicle Tracking with Stereo Sound

Multimodal Learning Computer Vision

The sound of motions

Computer Vision

New tricks from old dogs: multi-source transfer learning

Transfer Learning Explainability

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

Neuro-Symbolic AI Computer Vision

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

ECCV Multimodal Learning

Dialog-based Interactive Image Retrieval

NeurIPS Computer Vision

The Sound of Pixels

Multimodal Learning Computer Vision

Learning to Separate Object Sounds by Watching Unlabeled Video

Computer Vision