Computer Vision
All Work
Participatory AI highlights paths to sustainability
Participatory AI highlights paths to sustainability
MIT ILP
↗
MIT researchers advance automated interpretability in AI models
MIT researchers advance automated interpretability in AI models
MIT News
↗
Understanding the visual knowledge of language models
Understanding the visual knowledge of language models
MIT News
↗
Researchers use large language models to help robots navigate
Researchers use large language models to help robots navigate
MIT News
↗
Looking for a specific action in a video? This AI-based method can find it for you
Looking for a specific action in a video? This AI-based method can find it for you
MIT News
↗
Next Steps for AI: Creating 3D Understanding from 2D Images
Next Steps for AI: Creating 3D Understanding from 2D Images
MIT ILP
↗
Reasoning and reliability in AI
Reasoning and reliability in AI
MIT News
↗
Image recognition accuracy: An unseen challenge confounding today’s AI
Image recognition accuracy: An unseen challenge confounding today’s AI
MIT News
↗
A computer scientist pushes the boundaries of geometry
A computer scientist pushes the boundaries of geometry
MIT News
↗
Helping computer vision and language models understand what they see
Helping computer vision and language models understand what they see
MIT News
↗
Computer vision system marries image recognition and generation
Computer vision system marries image recognition and generation
MIT News
↗
A better way to match 3D volumes
A better way to match 3D volumes
MIT News
↗
Creating space for the evolution of generative and trustworthy AI
Creating space for the evolution of generative and trustworthy AI
MIT-IBM Watson AI Lab
↗
S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint
S3-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
A simpler path to better computer vision
A simpler path to better computer vision
MIT News
↗
In machine learning, synthetic data can offer real performance improvements
In machine learning, synthetic data can offer real performance improvements
MIT News
↗
Using sound to model the world
Using sound to model the world
MIT News
↗
Student-powered machine learning
Student-powered machine learning
MIT News
↗
Artificial intelligence system learns concepts shared across video, audio, and text
Artificial intelligence system learns concepts shared across video, audio, and text
MIT News
↗
Will transformers take over artificial intelligence?
Will transformers take over artificial intelligence?
Quanta Magazine
↗
Putting AI in IoT chips? It’s a question of memory
Putting AI in IoT chips? It’s a question of memory
Tech Monitor
↗
Unlocking new doors to artificial intelligence
Unlocking new doors to artificial intelligence
MIT News
↗
Seven from MIT named 2022 Sloan Research Fellows
Seven from MIT named 2022 Sloan Research Fellows
MIT News
↗
TinyML is bringing neural networks to small microcontrollers
TinyML is bringing neural networks to small microcontrollers
TechTalks
↗
AI Researchers Fight Noise by Turning to Biology
AI Researchers Fight Noise by Turning to Biology
Quanta Magazine
↗
Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices
Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices
MIT News
↗
Machines that see the world more like humans do
Machines that see the world more like humans do
MIT News
↗
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion
The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion
These locations may look eerily familiar, but none actually exist
These locations may look eerily familiar, but none actually exist
Fast Company
↗
AI Learns to Predict Human Behavior from Videos
AI Learns to Predict Human Behavior from Videos
Columbia Engineering
↗
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning
Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
StarNet: towards weakly supervised few-shot detection and explainable few-shot classification
Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning.
Facebook’s New AI Teaches Itself to See With Less Human Help
Facebook’s New AI Teaches Itself to See With Less Human Help
Wired
↗
Is neuroscience the key to protecting AI from adversarial attacks?
Is neuroscience the key to protecting AI from adversarial attacks?
TechTalks
↗
Neuroscientists find a way to make object-recognition models perform better
Neuroscientists find a way to make object-recognition models perform better
MIT News
↗
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning
We Have So Much in Common: Modeling Semantic Relational Set Abstractions in Videos
We Have So Much in Common: Modeling Semantic Relational Set Abstractions in Videos
RetrieveGAN AI tool combines scene fragments to create new images
RetrieveGAN AI tool combines scene fragments to create new images
VentureBeat
↗
Research Highlights: ExBERT
Research Highlights: ExBERT
InsideBIGDATA
↗
Here’s what’s stopping AI from reaching human-like understanding
Here’s what’s stopping AI from reaching human-like understanding
TheNextWeb
↗
Identifying a melody by studying a musician’s body language
Identifying a melody by studying a musician’s body language
MIT News
↗
Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors
Relationship Matters: Relation Guided Knowledge Transfer for Incremental Learning of Object Detectors
Undergraduates develop next-generation intelligence tools
Undergraduates develop next-generation intelligence tools
MIT News
↗
LaSO: Label-Set Operations networks for multi-label few-shot learning
LaSO: Label-Set Operations networks for multi-label few-shot learning
RepMet: Representative-based metric learning for classification and one-shot object detection
RepMet: Representative-based metric learning for classification and one-shot object detection
Why computer vision algorithms need new benchmarks
Why computer vision algorithms need new benchmarks
Tech Talks
↗
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The mind-bending confusion of ‘hammer on a bed’ shows computer vision is far from solved
The Verge
↗
This object-recognition dataset stumped the world’s best computer vision models
This object-recognition dataset stumped the world’s best computer vision models
MIT News
↗
This Technique Can Make It Easier for AI to Understand Videos
This Technique Can Make It Easier for AI to Understand Videos
Wired
↗
Powerful computer vision algorithms are now small enough to run on your phone
Powerful computer vision algorithms are now small enough to run on your phone
MIT Technology Review
↗
Faster video recognition for the smartphone era
Faster video recognition for the smartphone era
MIT News
↗
MIT-IBM developed a faster way to train video recognition AI
MIT-IBM developed a faster way to train video recognition AI
Engadget
↗
The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence
The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
Machine learning system tackles speech and object recognition, all at once
Machine learning system tackles speech and object recognition, all at once
MIT News
↗
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Delta-encoder: an effective sample synthesis method for few-shot object recognition
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation
Artificial intelligence in action
Artificial intelligence in action
MIT News
↗