Computer Vision

All Work

RetrieveGAN AI tool combines scene fragments to create new images
RetrieveGAN AI tool combines scene fragments to create new images
VentureBeat
Research Highlights: ExBERT
Research Highlights: ExBERT
InsideBIGDATA
Here’s what’s stopping AI from reaching human-like understanding
Here’s what’s stopping AI from reaching human-like understanding
TheNextWeb
Identifying a melody by studying a musician’s body language
Identifying a melody by studying a musician’s body language
MIT News
Undergraduates develop next-generation intelligence tools
Undergraduates develop next-generation intelligence tools
MIT News
CLEVRER: The first video dataset for neuro-symbolic reasoning
CLEVRER: The first video dataset for neuro-symbolic reasoning
 
Reasoning about Human-Object Interactions through Dual Attention Networks
Reasoning about Human-Object Interactions through Dual Attention Networks
 
LaSO: Label-Set Operations networks for multi-label few-shot learning
LaSO: Label-Set Operations networks for multi-label few-shot learning
 
SpotTune: Transfer Learning through Adaptive Fine-tuning
SpotTune: Transfer Learning through Adaptive Fine-tuning
 
RepMet: Representative-based metric learning for classification and one-shot object detection
RepMet: Representative-based metric learning for classification and one-shot object detection
 
ObjectNet: A bias-controlled dataset object recognition
ObjectNet: A bias-controlled dataset object recognition
 
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
 
Self-supervised Moving Vehicle Tracking with Stereo Sound
Self-supervised Moving Vehicle Tracking with Stereo Sound
 
The sound of motions
The sound of motions
 
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
 
Graph Convolutional Networks for Temporal Action Localization
Graph Convolutional Networks for Temporal Action Localization
 
Why computer vision algorithms need new benchmarks
Why computer vision algorithms need new benchmarks
Tech Talks
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The Verge
Big-Little-Video-Net: Work smarter, not harder, for video understanding
Big-Little-Video-Net: Work smarter, not harder, for video understanding
 
This object-recognition dataset stumped the world’s best computer vision models
This object-recognition dataset stumped the world’s best computer vision models
MIT News
Cross-channel Communication Networks
Cross-channel Communication Networks
 
Watch, Reason and Code: Learning to Represent Videos Using Program
Watch, Reason and Code: Learning to Represent Videos Using Program
 
This Technique Can Make It Easier for AI to Understand Videos
This Technique Can Make It Easier for AI to Understand Videos
Wired
Powerful computer vision algorithms are now small enough to run on your phone
Powerful computer vision algorithms are now small enough to run on your phone
MIT Technology Review
Faster video recognition for the smartphone era
Faster video recognition for the smartphone era
MIT News
MIT-IBM developed a faster way to train video recognition AI
MIT-IBM developed a faster way to train video recognition AI
Engadget
Facial Image-to-Video Translation by a Hidden Affine Transformation
Facial Image-to-Video Translation by a Hidden Affine Transformation
 
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
 
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
 
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
 
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
 
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
 
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
 
Machine learning system tackles speech and object recognition, all at once
Machine learning system tackles speech and object recognition, all at once
MIT News
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Delta-encoder: an effective sample synthesis method for few-shot object recognition
 
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
 
Dialog-based Interactive Image Retrieval
Dialog-based Interactive Image Retrieval
 
Learning to Separate Object Sounds by Watching Unlabeled Video
Learning to Separate Object Sounds by Watching Unlabeled Video
 
Artificial intelligence in action
Artificial intelligence in action
MIT News
BlockDrop: Dynamic Inference Paths in Residual Networks
BlockDrop: Dynamic Inference Paths in Residual Networks