Supervised Video Summarization
7 papers with code • 2 benchmarks • 3 datasets
Supervised video summarization rely on datasets with human-labeled ground-truth annotations (either in the form of video summaries, as in the case of the SumMe dataset, or in the form of frame-level importance scores, as in the case of the TVSum dataset), based on which they try to discover the underlying criterion for video frame/fragment selection and video summarization.
Source: Video Summarization Using Deep Neural Networks: A Survey
Most implemented papers
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward
Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos.
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions.
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention
The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i. e., derived from an image classification model.
Discriminative Feature Learning for Unsupervised Video Summarization
The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance.
DSNet: A Flexible Detect-to-Summarize Network for Video Summarization
In this paper, we propose a Detect-to-Summarize network (DSNet) framework for supervised video summarization.
Combining Global and Local Attention with Positional Encoding for Video Summarization
This paper presents a new method for supervised video summarization.
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.