Computer Vision

All Work

Adversarial T-shirt! Evading Person Detectors in A Physical World
Adversarial T-shirt! Evading Person Detectors in A Physical World
 
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
 
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
 
VA-RED^2: Video Adaptive Redundancy Reduction
VA-RED^2: Video Adaptive Redundancy Reduction
 
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
 
Non-Adversarial Video Synthesis with Learned Priors
Non-Adversarial Video Synthesis with Learned Priors
 
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning
 
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
 
Semi-Supervised Action Recognition with Temporal Contrastive Learning
Semi-Supervised Action Recognition with Temporal Contrastive Learning
 
A Broader Study of Cross-Domain Few-Shot Learning
A Broader Study of Cross-Domain Few-Shot Learning
 
OnlineAugment: Online Data Augmentation with Less Domain Knowledge
OnlineAugment: Online Data Augmentation with Less Domain Knowledge
 
TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification
TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot classification
 
Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation
Domain2Vec: Domain Embedding for Unsupervised Domain Adaptation
 
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
 
We Have So Much in Common: Modeling Semantic Relational Set Abstractions in Videos
We Have So Much in Common: Modeling Semantic Relational Set Abstractions in Videos
 
Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
 
Wasserstein Style Transfer
Wasserstein Style Transfer
 
GAN Compression: Efficient Architectures for Interactive Conditional GANs
GAN Compression: Efficient Architectures for Interactive Conditional GANs
 
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
 
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
 
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning
 
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
 
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
 
Why Do These Match? Explaining the Behavior of Image Similarity Models
Why Do These Match? Explaining the Behavior of Image Similarity Models
 
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
 
Fine-grained Angular Contrastive Learning with Coarse Labels
Fine-grained Angular Contrastive Learning with Coarse Labels
 
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics.
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics.
 
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
 
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.
 
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
MVFNet: Multi-View Fusion Network for Efficient Video Recognition
 
Black-box Explanation of Object Detectors via Saliency Maps
Black-box Explanation of Object Detectors via Saliency Maps
 
Anycost GANs for Interactive Image Synthesis and Editing
Anycost GANs for Interactive Image Synthesis and Editing
 
AGENT: A Benchmark for Core Psychological Reasoning
AGENT: A Benchmark for Core Psychological Reasoning
 
Facebook’s New AI Teaches Itself to See With Less Human Help
Facebook’s New AI Teaches Itself to See With Less Human Help
Wired
Is neuroscience the key to protecting AI from adversarial attacks?
Is neuroscience the key to protecting AI from adversarial attacks?
TechTalks
Neuroscientists find a way to make object-recognition models perform better
Neuroscientists find a way to make object-recognition models perform better
MIT News
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
 
RetrieveGAN AI tool combines scene fragments to create new images
RetrieveGAN AI tool combines scene fragments to create new images
VentureBeat
Research Highlights: ExBERT
Research Highlights: ExBERT
InsideBIGDATA
Here’s what’s stopping AI from reaching human-like understanding
Here’s what’s stopping AI from reaching human-like understanding
TheNextWeb
Identifying a melody by studying a musician’s body language
Identifying a melody by studying a musician’s body language
MIT News
Undergraduates develop next-generation intelligence tools
Undergraduates develop next-generation intelligence tools
MIT News
CLEVRER: The first video dataset for neuro-symbolic reasoning
CLEVRER: The first video dataset for neuro-symbolic reasoning
 
Reasoning about Human-Object Interactions through Dual Attention Networks
Reasoning about Human-Object Interactions through Dual Attention Networks
 
LaSO: Label-Set Operations networks for multi-label few-shot learning
LaSO: Label-Set Operations networks for multi-label few-shot learning
 
SpotTune: Transfer Learning through Adaptive Fine-tuning
SpotTune: Transfer Learning through Adaptive Fine-tuning
 
RepMet: Representative-based metric learning for classification and one-shot object detection
RepMet: Representative-based metric learning for classification and one-shot object detection
 
ObjectNet: A bias-controlled dataset object recognition
ObjectNet: A bias-controlled dataset object recognition
 
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
 
Self-supervised Moving Vehicle Tracking with Stereo Sound
Self-supervised Moving Vehicle Tracking with Stereo Sound
 
The sound of motions
The sound of motions
 
TSM: Temporal Shift Module for Efficient Video Understanding
TSM: Temporal Shift Module for Efficient Video Understanding
 
Graph Convolutional Networks for Temporal Action Localization
Graph Convolutional Networks for Temporal Action Localization
 
Why computer vision algorithms need new benchmarks
Why computer vision algorithms need new benchmarks
Tech Talks
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The Verge
Big-Little-Video-Net: Work smarter, not harder, for video understanding
Big-Little-Video-Net: Work smarter, not harder, for video understanding
 
This object-recognition dataset stumped the world’s best computer vision models
This object-recognition dataset stumped the world’s best computer vision models
MIT News
Cross-channel Communication Networks
Cross-channel Communication Networks
 
Watch, Reason and Code: Learning to Represent Videos Using Program
Watch, Reason and Code: Learning to Represent Videos Using Program
 
This Technique Can Make It Easier for AI to Understand Videos
This Technique Can Make It Easier for AI to Understand Videos
Wired
Powerful computer vision algorithms are now small enough to run on your phone
Powerful computer vision algorithms are now small enough to run on your phone
MIT Technology Review
Faster video recognition for the smartphone era
Faster video recognition for the smartphone era
MIT News
MIT-IBM developed a faster way to train video recognition AI
MIT-IBM developed a faster way to train video recognition AI
Engadget
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
 
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
 
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
Revisiting RCNN: On Awakening the Classification Power of Faster RCNN
 
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
 
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
 
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
 
Machine learning system tackles speech and object recognition, all at once
Machine learning system tackles speech and object recognition, all at once
MIT News
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Delta-encoder: an effective sample synthesis method for few-shot object recognition
 
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
 
Dialog-based Interactive Image Retrieval
Dialog-based Interactive Image Retrieval
 
Learning to Separate Object Sounds by Watching Unlabeled Video
Learning to Separate Object Sounds by Watching Unlabeled Video
 
Artificial intelligence in action
Artificial intelligence in action
MIT News
BlockDrop: Dynamic Inference Paths in Residual Networks
BlockDrop: Dynamic Inference Paths in Residual Networks