Temporal Localization
55 papers with code • 0 benchmarks • 3 datasets
Benchmarks
These leaderboards are used to track progress in Temporal Localization
Libraries
Use these libraries to find Temporal Localization models and implementationsMost implemented papers
TALL: Temporal Activity Localization via Language Query
For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.
MAC: Mining Activity Concepts for Language-based Temporal Localization
Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.
Asynchronous Temporal Fields for Action Recognition
Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it.
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization
This paper presents a new large-scale dataset for recognition and temporal localization of human actions collected from Web videos.
Audio-Visual Event Localization in Unconstrained Videos
In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.
Technical Report of the Video Event Reconstruction and Analysis (VERA) System -- Shooter Localization, Models, Interface, and Beyond
Among other uses, VERA enables the localization of a shooter from just a few videos that include the sound of gunshots.
Finding Moments in Video Collections Using Natural Language
We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.
Egocentric Video-Language Pretraining
Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.
Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images
To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.