Computer Vision

All Work

CLEVRER: The first video dataset for neuro-symbolic reasoning
CLEVRER: The first video dataset for neuro-symbolic reasoning
Reasoning about Human-Object Interactions through Dual Attention Networks
Reasoning about Human-Object Interactions through Dual Attention Networks
 
LaSO: Label-Set Operations networks for multi-label few-shot learning
LaSO: Label-Set Operations networks for multi-label few-shot learning
SpotTune: Transfer Learning through Adaptive Fine-tuning
SpotTune: Transfer Learning through Adaptive Fine-tuning
RepMet: Representative-based metric learning for classification and one-shot object detection
RepMet: Representative-based metric learning for classification and one-shot object detection
 
ObjectNet: A bias-controlled dataset object recognition
ObjectNet: A bias-controlled dataset object recognition
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Self-supervised Moving Vehicle Tracking with Stereo Sound
Self-supervised Moving Vehicle Tracking with Stereo Sound
The sound of motions
The sound of motions
 
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
Graph Convolutional Networks for Temporal Action Localization
Graph Convolutional Networks for Temporal Action Localization
Why computer vision algorithms need new benchmarks
Why computer vision algorithms need new benchmarks
Tech Talks
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The Verge
Big-Little-Video-Net: Work smarter, not harder, for video understanding
Big-Little-Video-Net: Work smarter, not harder, for video understanding
 
This object-recognition dataset stumped the world’s best computer vision models
This object-recognition dataset stumped the world’s best computer vision models
MIT News
Watch, Reason and Code: Learning to Represent Videos Using Program
Watch, Reason and Code: Learning to Represent Videos Using Program
 
This Technique Can Make It Easier for AI to Understand Videos
This Technique Can Make It Easier for AI to Understand Videos
Wired
Powerful computer vision algorithms are now small enough to run on your phone
Powerful computer vision algorithms are now small enough to run on your phone
MIT Technology Review
Faster video recognition for the smartphone era
Faster video recognition for the smartphone era
MIT News
MIT-IBM developed a faster way to train video recognition AI
MIT-IBM developed a faster way to train video recognition AI
Engadget
Facial Image-to-Video Translation by a Hidden Affine Transformation
Facial Image-to-Video Translation by a Hidden Affine Transformation
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
 
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
 
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Machine learning system tackles speech and object recognition, all at once
Machine learning system tackles speech and object recognition, all at once
MIT News
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
Dialog-based Interactive Image Retrieval
Dialog-based Interactive Image Retrieval
 
Learning to Separate Object Sounds by Watching Unlabeled Video
Learning to Separate Object Sounds by Watching Unlabeled Video
 
Artificial intelligence in action
Artificial intelligence in action
MIT News
BlockDrop: Dynamic Inference Paths in Residual Networks
BlockDrop: Dynamic Inference Paths in Residual Networks