Self-supervised Moving Vehicle Tracking with Stereo Sound
Authors
Authors
- Chuang Gan
- Hang Zhao
- Peihao Chen
- David Cox
- Antonio Torralba
Authors
- Chuang Gan
- Hang Zhao
- Peihao Chen
- David Cox
- Antonio Torralba
Published on
10/25/2019
Categories
Humans are able to localize objects in the environment using both visual and auditory cues, integrating information from multiple modalities into a common reference frame. We introduce a system that can leverage unlabeled audio-visual data to learn to localize objects (moving vehicles) in a visual reference frame, purely using stereo sound at inference time. Since it is labor-intensive to manually annotate the correspondences between audio and object bounding boxes, we achieve this goal by using the co-occurrence of visual and audio streams in unlabeled videos as a form of self-supervision, without resorting to the collection of ground-truth annotations. In particular, we propose a framework that consists of a vision “teacher” network and a stereo-sound “student” network. During training, knowledge embodied in a well-established visual vehicle detection model is transferred to the audio domain using unlabeled videos as a bridge. At test time, the stereo-sound student network can work independently to perform object localization using just stereo audio and camera meta-data, without any visual input. Experimental results on a newly collected Auditory Vehicle Tracking dataset verify that our proposed approach outperforms several baseline approaches. We also demonstrate that our cross-modal auditory localization approach can assist in the visual localization of moving vehicles under poor lighting conditions.
Please cite our work using the BibTeX below.
@article{Gan_2019,
title={Self-Supervised Moving Vehicle Tracking With Stereo Sound},
ISBN={9781728148038},
url={http://dx.doi.org/10.1109/ICCV.2019.00715},
DOI={10.1109/iccv.2019.00715},
journal={2019 IEEE/CVF International Conference on Computer Vision (ICCV)},
publisher={IEEE},
author={Gan, Chuang and Zhao, Hang and Chen, Peihao and Cox, David and Torralba, Antonio},
year={2019},
month={Oct}
}