Research

Semi-Supervised Action Recognition with Temporal Contrastive Learning

CVPR

Authors

Published on

02/04/2021

Categories

Computer Vision CVPR

Learning to recognize actions from only a handful of labeled videos is a challenging problem due to the scarcity of tediously collected activity labels. We approach this problem by learning a two-pathway temporal contrastive model using unlabeled videos in two different speeds. Specifically, we propose to maximize the similarity between encoded representations of the same video in two different speeds as well as minimize the same between different videos run in different speeds. This way we leverage the rich supervisory information in terms of `time’ that is present in otherwise unsupervised pull of videos. With this simple yet surprisingly effective strategy of manipulating playback rates of unlabeled video, we are able to considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methodologies across multiple datasets and network architectures. Interestingly, our approach is shown to benefit from out-of-domain unlabeled videos showing robustness and generalizability of it. We also perform rigorous ablations and analysis to validate our approach.

This paper has been published at CVPR 2021

Please cite our work using the BibTeX below.

@misc{singh2021semisupervised,
      title={Semi-Supervised Action Recognition with Temporal Contrastive Learning}, 
      author={Ankit Singh and Omprakash Chakraborty and Ashutosh Varshney and Rameswar Panda and Rogerio Feris and Kate Saenko and Abir Das},
      year={2021},
      eprint={2102.02751},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Close Modal