Video Understanding

295 papers with code • 0 benchmarks • 42 datasets

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Understanding

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Libraries

Use these libraries to find Video Understanding models and implementations

open-mmlab/mmaction2

7 papers

3,888

towhee-io/towhee

4 papers

2,991

google-research/scenic

2 papers

2,995

MIT-HAN-LAB/temporal-shift-module

2 papers

2,019

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Video Swin Transformer

SwinTransformer/Video-Swin-Transformer • • CVPR 2022

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

Paper
Code

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module • • ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Paper
Code

Is Space-Time Attention All You Need for Video Understanding?

facebookresearch/TimeSformer • • 9 Feb 2021

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

Paper
Code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

tensorflow/models • • CVPR 2018

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

Paper
Code

SoccerNet 2022 Challenges Results

soccernet/sn-calibration • • 5 Oct 2022

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Paper
Code

ECO: Efficient Convolutional Network for Online Video Understanding

mzolfaghari/ECO-efficient-video-understanding • • ECCV 2018

In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.

Paper
Code

Learnable pooling with Context Gating for video classification

antoine77340/Youtube-8M-WILLOW • • 21 Jun 2017

In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

Paper
Code

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 • • CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Paper
Code

Video Instance Segmentation

Epiphqny/VisTR • • ICCV 2019

The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.

Paper
Code

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

ArrowLuo/CLIP4Clip • • 18 Apr 2021

In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP model to video-language retrieval in an end-to-end manner.

Paper
Code

Video Understanding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result