Computer Vision

All Work

Participatory AI highlights paths to sustainability
Participatory AI highlights paths to sustainability
MIT ILP
MIT researchers advance automated interpretability in AI models
MIT researchers advance automated interpretability in AI models
MIT News
Understanding the visual knowledge of language models
Understanding the visual knowledge of language models
MIT News
Researchers use large language models to help robots navigate
Researchers use large language models to help robots navigate
MIT News
Looking for a specific action in a video? This AI-based method can find it for you
Looking for a specific action in a video? This AI-based method can find it for you
MIT News
Next Steps for AI: Creating 3D Understanding from 2D Images
Next Steps for AI: Creating 3D Understanding from 2D Images
MIT ILP
Reasoning and reliability in AI
Reasoning and reliability in AI
MIT News
Image recognition accuracy: An unseen challenge confounding today’s AI
Image recognition accuracy: An unseen challenge confounding today’s AI
MIT News
A computer scientist pushes the boundaries of geometry
A computer scientist pushes the boundaries of geometry
MIT News
Helping computer vision and language models understand what they see
Helping computer vision and language models understand what they see
MIT News
Computer vision system marries image recognition and generation
Computer vision system marries image recognition and generation
MIT News
A better way to match 3D volumes
A better way to match 3D volumes
MIT News
Creating space for the evolution of generative and trustworthy AI
Creating space for the evolution of generative and trustworthy AI
MIT-IBM Watson AI Lab
Creating space for the evolution of generative and trustworthy AI
Creating space for the evolution of generative and trustworthy AI
 
S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint
S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint
 
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
 
A simpler path to better computer vision
A simpler path to better computer vision
MIT News
In machine learning, synthetic data can offer real performance improvements
In machine learning, synthetic data can offer real performance improvements
MIT News
Using sound to model the world
Using sound to model the world
MIT News
Student-powered machine learning
Student-powered machine learning
MIT News
Artificial intelligence system learns concepts shared across video, audio, and text
Artificial intelligence system learns concepts shared across video, audio, and text
MIT News
Can an Image Classifier Suffice For Action Recognition?
Can an Image Classifier Suffice For Action Recognition?
 
Will transformers take over artificial intelligence?
Will transformers take over artificial intelligence?
Quanta Magazine
Putting AI in IoT chips? It’s a question of memory
Putting AI in IoT chips? It’s a question of memory
Tech Monitor
Unlocking new doors to artificial intelligence
Unlocking new doors to artificial intelligence
MIT News
Seven from MIT named 2022 Sloan Research Fellows
Seven from MIT named 2022 Sloan Research Fellows
MIT News
TinyML is bringing neural networks to small microcontrollers
TinyML is bringing neural networks to small microcontrollers
TechTalks
AI Researchers Fight Noise by Turning to Biology
AI Researchers Fight Noise by Turning to Biology
Quanta Magazine
Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices
Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices
MIT News
Machines that see the world more like humans do
Machines that see the world more like humans do
MIT News
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
 
The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion
The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion
 
These locations may look eerily familiar, but none actually exist
These locations may look eerily familiar, but none actually exist
Fast Company
AI Learns to Predict Human Behavior from Videos
AI Learns to Predict Human Behavior from Videos
Columbia Engineering
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
 
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
 
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
 
Non-Adversarial Video Synthesis with Learned Priors
Non-Adversarial Video Synthesis with Learned Priors
 
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning
 
Semi-Supervised Action Recognition with Temporal Contrastive Learning
Semi-Supervised Action Recognition with Temporal Contrastive Learning
 
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
 
Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
 
Wasserstein Style Transfer
Wasserstein Style Transfer
 
GAN Compression: Efficient Architectures for Interactive Conditional GANs
GAN Compression: Efficient Architectures for Interactive Conditional GANs
 
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
 
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning
 
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
 
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
 
Why Do These Match? Explaining the Behavior of Image Similarity Models
Why Do These Match? Explaining the Behavior of Image Similarity Models
 
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
 
Fine-grained Angular Contrastive Learning with Coarse Labels
Fine-grained Angular Contrastive Learning with Coarse Labels
 
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics
 
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
 
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.
 
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
 
Black-box Explanation of Object Detectors via Saliency Maps
Black-box Explanation of Object Detectors via Saliency Maps
 
Anycost GANs for Interactive Image Synthesis and Editing
Anycost GANs for Interactive Image Synthesis and Editing
 
AGENT: A Benchmark for Core Psychological Reasoning
AGENT: A Benchmark for Core Psychological Reasoning
 
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
 
VA-RED^2: Video Adaptive Redundancy Reduction
VA-RED^2: Video Adaptive Redundancy Reduction
 
Facebook’s New AI Teaches Itself to See With Less Human Help
Facebook’s New AI Teaches Itself to See With Less Human Help
Wired
Is neuroscience the key to protecting AI from adversarial attacks?
Is neuroscience the key to protecting AI from adversarial attacks?
TechTalks
Neuroscientists find a way to make object-recognition models perform better
Neuroscientists find a way to make object-recognition models perform better
MIT News
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
 
Adversarial T-shirt! Evading Person Detectors in A Physical World
Adversarial T-shirt! Evading Person Detectors in A Physical World
 
A Broader Study of Cross-Domain Few-Shot Learning
A Broader Study of Cross-Domain Few-Shot Learning
 
OnlineAugment: Online Data Augmentation with Less Domain Knowledge
OnlineAugment: Online Data Augmentation with Less Domain Knowledge
 
TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification
TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification
 
Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation
Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation
 
We Have So Much in Common: Modeling Semantic Relational Set Abstractions in Videos
We Have So Much in Common: Modeling Semantic Relational Set Abstractions in Videos
 
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
 
RetrieveGAN AI tool combines scene fragments to create new images
RetrieveGAN AI tool combines scene fragments to create new images
VentureBeat
Research Highlights: ExBERT
Research Highlights: ExBERT
InsideBIGDATA
Here’s what’s stopping AI from reaching human-like understanding
Here’s what’s stopping AI from reaching human-like understanding
TheNextWeb
Identifying a melody by studying a musician’s body language
Identifying a melody by studying a musician’s body language
MIT News
Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors
Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors
 
Undergraduates develop next-generation intelligence tools
Undergraduates develop next-generation intelligence tools
MIT News
CLEVRER: The first video dataset for neuro-symbolic reasoning
CLEVRER: The first video dataset for neuro-symbolic reasoning
 
LaSO: Label-Set Operations networks for multi-label few-shot learning
LaSO: Label-Set Operations networks for multi-label few-shot learning
 
SpotTune: Transfer Learning through Adaptive Fine-tuning
SpotTune: Transfer Learning through Adaptive Fine-tuning
 
RepMet: Representative-based metric learning for classification and one-shot object detection
RepMet: Representative-based metric learning for classification and one-shot object detection
 
ObjectNet: A bias-controlled dataset object recognition
ObjectNet: A bias-controlled dataset object recognition
 
Self-supervised Moving Vehicle Tracking with Stereo Sound
Self-supervised Moving Vehicle Tracking with Stereo Sound
 
The sound of motions
The sound of motions
 
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
 
Graph Convolutional Networks for Temporal Action Localization
Graph Convolutional Networks for Temporal Action Localization
 
Why computer vision algorithms need new benchmarks
Why computer vision algorithms need new benchmarks
Tech Talks
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The Verge
Big-Little-Video-Net: Work smarter, not harder, for video understanding
Big-Little-Video-Net: Work smarter, not harder, for video understanding
 
This object-recognition dataset stumped the world’s best computer vision models
This object-recognition dataset stumped the world’s best computer vision models
MIT News
Cross-channel Communication Networks
Cross-channel Communication Networks
 
Reasoning about Human-Object Interactions through Dual Attention Networks
Reasoning about Human-Object Interactions through Dual Attention Networks
 
Watch, Reason and Code: Learning to Represent Videos Using Program
Watch, Reason and Code: Learning to Represent Videos Using Program
 
This Technique Can Make It Easier for AI to Understand Videos
This Technique Can Make It Easier for AI to Understand Videos
Wired
Powerful computer vision algorithms are now small enough to run on your phone
Powerful computer vision algorithms are now small enough to run on your phone
MIT Technology Review
Faster video recognition for the smartphone era
Faster video recognition for the smartphone era
MIT News
MIT-IBM developed a faster way to train video recognition AI
MIT-IBM developed a faster way to train video recognition AI
Engadget
The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence
The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence
 
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
 
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
 
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
 
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
 
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
 
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
 
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
 
Machine learning system tackles speech and object recognition, all at once
Machine learning system tackles speech and object recognition, all at once
MIT News
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Delta-encoder: an effective sample synthesis method for few-shot object recognition
 
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
 
Dialog-based Interactive Image Retrieval
Dialog-based Interactive Image Retrieval
 
Learning to Separate Object Sounds by Watching Unlabeled Video
Learning to Separate Object Sounds by Watching Unlabeled Video
 
Artificial intelligence in action
Artificial intelligence in action
MIT News
BlockDrop: Dynamic Inference Paths in Residual Networks
BlockDrop: Dynamic Inference Paths in Residual Networks