» Computer Vision

A better method for planning complex visual tasks

MIT News

↗

Antonio Torralba, three MIT alumni named 2025 ACM fellows

MIT News

↗

Method teaches generative AI models to locate personalized objects

MIT News

↗

MIT tool visualizes and edits “physically impossible” objects

MIT News

↗

AI learns how vision and sound are connected, without human intervention

MIT News

↗

IBM Granite now has eyes

IBM Research

↗

Participatory AI highlights paths to sustainability

MIT ILP

↗

MIT researchers advance automated interpretability in AI models

MIT News

↗

Understanding the visual knowledge of language models

MIT News

↗

Researchers use large language models to help robots navigate

MIT News

↗

Looking for a specific action in a video? This AI-based method can find it for you

MIT News

↗

Next Steps for AI: Creating 3D Understanding from 2D Images

MIT ILP

↗

Reasoning and reliability in AI

MIT News

↗

Image recognition accuracy: An unseen challenge confounding today’s AI

MIT News

↗

A computer scientist pushes the boundaries of geometry

MIT News

↗

Helping computer vision and language models understand what they see

MIT News

↗

Computer vision system marries image recognition and generation

MIT News

↗

A better way to match 3D volumes

MIT News

↗

Creating space for the evolution of generative and trustworthy AI

MIT-IBM Watson AI Lab

↗

Creating space for the evolution of generative and trustworthy AI

Generative Models AI for Good

S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint

NeurIPS Computer Vision

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

Computer Vision NeurIPS

A simpler path to better computer vision

MIT News

↗

In machine learning, synthetic data can offer real performance improvements

MIT News

↗

Using sound to model the world

MIT News

↗

Student-powered machine learning

MIT News

↗

Artificial intelligence system learns concepts shared across video, audio, and text

MIT News

↗

Can an Image Classifier Suffice For Action Recognition?

ICLR

Will transformers take over artificial intelligence?

Quanta Magazine

↗

Putting AI in IoT chips? It’s a question of memory

Tech Monitor

↗

Unlocking new doors to artificial intelligence

MIT News

↗

Seven from MIT named 2022 Sloan Research Fellows

MIT News

↗

TinyML is bringing neural networks to small microcontrollers

TechTalks

↗

AI Researchers Fight Noise by Turning to Biology

Quanta Magazine

↗

Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices

MIT News

↗

Machines that see the world more like humans do

MIT News

↗

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

Computer Vision Computational neuroscience

The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion

Computational neuroscience Computer Vision

These locations may look eerily familiar, but none actually exist

Fast Company

↗

AI Learns to Predict Human Behavior from Videos

Columbia Engineering

↗

Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

CVPR Computer Vision

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

ECCV Computer Vision

NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

AAAI Computer Vision

Non-Adversarial Video Synthesis with Learned Priors

CVPR Computer Vision

Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning

CVPR Computer Vision

Semi-Supervised Action Recognition with Temporal Contrastive Learning

CVPR Computer Vision

StarNet: towards weakly supervised few-shot detection and explainable few-shot classification

AAAI Computer Vision

Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback

CVPR Computer Vision

Wasserstein Style Transfer

AISTATS Machine Learning

GAN Compression: Efficient Architectures for Interactive Conditional GANs

CVPR Computer Vision

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

AAAI Machine Learning

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning

ICLR Machine Learning

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

CVPR Computer Vision

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

CVPR Computer Vision

Why Do These Match? Explaining the Behavior of Image Similarity Models

ECCV Computer Vision

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

ECCV Computer Vision

Fine-grained Angular Contrastive Learning with Coarse Labels

CVPR Computer Vision

PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics

ICLR Machine Learning

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

ICLR Computer Vision

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.

AAAI Computer Vision

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

AAAI Computer Vision

Black-box Explanation of Object Detectors via Saliency Maps

CVPR Computer Vision

Anycost GANs for Interactive Image Synthesis and Editing

CVPR Computer Vision

AGENT: A Benchmark for Core Psychological Reasoning

ICML Artificial Intelligence

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

ICLR Computer Vision

VA-RED^2: Video Adaptive Redundancy Reduction

ICLR Computer Vision

Facebook’s New AI Teaches Itself to See With Less Human Help

Wired

↗

Is neuroscience the key to protecting AI from adversarial attacks?

TechTalks

↗

Neuroscientists find a way to make object-recognition models perform better

MIT News

↗

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning

Computer Vision Multimodal Learning

Adversarial T-shirt! Evading Person Detectors in A Physical World

ECCV Computer Vision

A Broader Study of Cross-Domain Few-Shot Learning

ECCV Computer Vision

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

ECCV Computer Vision

TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification

ECCV Computer Vision

Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation

ECCV Computer Vision

We Have So Much in Common: Modeling Semantic Relational Set Abstractions in Videos

ECCV Computer Vision

AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

ECCV Computer Vision

RetrieveGAN AI tool combines scene fragments to create new images

VentureBeat

↗

Research Highlights: ExBERT

InsideBIGDATA

↗

Here’s what’s stopping AI from reaching human-like understanding

TheNextWeb

↗

Identifying a melody by studying a musician’s body language

MIT News

↗

Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors

CVPR Computer Vision

Undergraduates develop next-generation intelligence tools

MIT News

↗

CLEVRER: The first video dataset for neuro-symbolic reasoning

Neuro-Symbolic AI Computer Vision

LaSO: Label-Set Operations networks for multi-label few-shot learning

Computer Vision Few-shot Learning

SpotTune: Transfer Learning through Adaptive Fine-tuning

Computer Vision Transfer Learning

RepMet: Representative-based metric learning for classification and one-shot object detection

Computer Vision

ObjectNet: A bias-controlled dataset object recognition

Computer Vision NeurIPS

Self-supervised Moving Vehicle Tracking with Stereo Sound

Multimodal Learning Computer Vision

The sound of motions

Computer Vision

TSM: Temporal Shift Module for Efficient Video Understanding

Efficient AI Computer Vision

Graph Convolutional Networks for Temporal Action Localization

Graph Deep Learning Computer Vision

Why computer vision algorithms need new benchmarks

Tech Talks

↗

The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved

The Verge

↗

Big-Little-Video-Net: Work smarter, not harder, for video understanding

Computer Vision

This object-recognition dataset stumped the world’s best computer vision models

MIT News

↗

Cross-channel Communication Networks

NeurIPS Deep Learning

Reasoning about Human-Object Interactions through Dual Attention Networks

ICCV Computer Vision

Watch, Reason and Code: Learning to Represent Videos Using Program

Computer Vision

This Technique Can Make It Easier for AI to Understand Videos

Wired

↗

Powerful computer vision algorithms are now small enough to run on your phone

MIT Technology Review

↗

Faster video recognition for the smartphone era

MIT News

↗

MIT-IBM developed a faster way to train video recognition AI

Engadget

↗

The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence

Computational neuroscience Computer Vision

Local Unsupervised Learning for Image Analysis

Unsupervised Learning Computer Vision

Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering

Computer Vision Time Series

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

Neuro-Symbolic AI Computer Vision

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Computer Vision

Moments in Time Dataset: one million videos for event understanding

Computer Vision Multimodal Learning

Weakly Supervised Dense Event Captioning in Videos

Computer Vision

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Computer Vision Time Series

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Neuro-Symbolic AI Computer Vision

Machine learning system tackles speech and object recognition, all at once

MIT News

↗

Delta-encoder: an effective sample synthesis method for few-shot object recognition

Computer Vision Few-shot Learning

Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation

CVPR Computer Vision

Dialog-based Interactive Image Retrieval

NeurIPS Computer Vision

The Sound of Pixels

Multimodal Learning Computer Vision

Learning to Separate Object Sounds by Watching Unlabeled Video

Computer Vision

Artificial intelligence in action

MIT News

↗

BlockDrop: Dynamic Inference Paths in Residual Networks

Computer Vision