Temporal Sentence Grounding

11 papers with code • 1 benchmarks • 1 datasets

Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. For this task, different levels of supervision are used. 1) Weak supervision: video-level action category set; 2) Semi-weak supervision: video-level action category set, and action annotations at several timestamps; 3) Full supervision: Action category and action interval annotations of all actions in untrimmed videos.

Benchmarks

Add a Result

These leaderboards are used to track progress in Temporal Sentence Grounding

Trend	Dataset	Best Model	Paper	Code	Compare
	Charades-STA	AdaFocus (Full, MViT-Charades-Pretrain-feature, MMN model)			See all

Datasets

Charades-STA

Most implemented papers

Most implemented Social Latest No code

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

mcg-nju/mmn • • 10 Sep 2021

Instead, from a perspective on temporal grounding as a metric-learning problem, we present a Mutual Matching Network (MMN), to directly model the similarity between language queries and video moments in a joint embedding space.

Paper
Code

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

yytzsy/SCDM • • NeurIPS 2019

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence.

Paper
Code

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

mayu-ot/hidden-challenges-MR • 1 Sep 2020

In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.

Paper
Code

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

liudaizong/CBLN • CVPR 2021

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Paper
Code

Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning

minghangz/cpl • • CVPR 2022

Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video.

Paper
Code

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

solicucu/d3g • • ICCV 2023

Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA).

Paper
Code

Temporal Sentence Grounding in Streaming Videos

sczwangxiao/tsgvs-mm2023 • • 14 Aug 2023

The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query.

Paper
Code

Learning Temporal Sentence Grounding From Narrated EgoVideos

keflanagan/climer • • 26 Oct 2023

Compared to traditional benchmarks on which this task is evaluated, these datasets offer finer-grained sentences to ground in notably longer videos.

Paper
Code

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos

Pilhyeon/BAM-DETR • • 30 Nov 2023

However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions.

Paper
Code

Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding

sunoh-kim/pps • • 27 Dec 2023

In the weakly supervised temporal video grounding study, previous methods use predetermined single Gaussian proposals which lack the ability to express diverse events described by the sentence query.

Paper
Code

Temporal Sentence Grounding

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result