StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
Authors
Authors
- Dongliang He
- Zhichao Zhou
- Chuang Gan
- Fu Li
- Xiao Liu
- Yandong Li
- Limin Wang
- Shilei Wen
Authors
- Dongliang He
- Zhichao Zhou
- Chuang Gan
- Fu Li
- Xiao Liu
- Yandong Li
- Limin Wang
- Shilei Wen
Published on
11/05/2018
Categories
Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos. Particularly, StNet stacks N successive video frames into a emph{super-image} which has 3N channels and applies 2D convolution on super-images to capture local spatial-temporal relationship. To model global spatial-temporal relationship, we apply temporal convolution on the local spatial-temporal feature maps. Specifically, a novel temporal Xception block is proposed in StNet. It employs a separate channel-wise and temporal-wise convolution over the feature sequence of video. Extensive experiments on the Kinetics dataset demonstrate that our framework outperforms several state-of-the-art approaches in action recognition and can strike a satisfying trade-off between recognition accuracy and model complexity. We further demonstrate the generalization performance of the leaned video representations on the UCF101 dataset.
Please cite our work using the BibTeX below.
@article{DBLP:journals/corr/abs-1811-01549,
author = {Dongliang He and
Zhichao Zhou and
Chuang Gan and
Fu Li and
Xiao Liu and
Yandong Li and
Limin Wang and
Shilei Wen},
title = {StNet: Local and Global Spatial-Temporal Modeling for Action Recognition},
journal = {CoRR},
volume = {abs/1811.01549},
year = {2018},
url = {http://arxiv.org/abs/1811.01549},
archivePrefix = {arXiv},
eprint = {1811.01549},
timestamp = {Mon, 16 Mar 2020 17:55:52 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1811-01549.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}