Action Classification
227 papers with code • 24 benchmarks • 30 datasets
Image source: The Kinetics Human Action Video Dataset
Libraries
Use these libraries to find Action Classification models and implementationsDatasets
Most implemented papers
High Quality Monocular Depth Estimation via Transfer Learning
Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.
Non-local Neural Networks
Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution
Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.
A Closer Look at Spatiotemporal Convolutions for Action Recognition
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.
Swin Transformer V2: Scaling Up Capacity and Resolution
Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.
SlowFast Networks for Video Recognition
We present SlowFast networks for video recognition.
Video Swin Transformer
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.
TSM: Temporal Shift Module for Efficient Video Understanding
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.