Action Anticipation
34 papers with code • 6 benchmarks • 8 datasets
Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second.
Most implemented papers
Rescaling Egocentric Vision
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-KITCHENS.
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention.
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention
Our method is ranked first in the public leaderboard of the EPIC-Kitchens egocentric action anticipation challenge 2019.
HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN
The hallucination task is treated as an auxiliary task, which can be used with any other action related task in a multitask learning setting.
Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video
The experiments show that the proposed architecture is state-of-the-art in the domain of egocentric videos, achieving top performances in the 2019 EPIC-Kitchens egocentric action anticipation challenge.
Temporal Aggregate Representations for Long-Range Video Understanding
Future prediction, especially in long-range videos, requires reasoning from current and past observations.
Encouraging LSTMs to Anticipate Actions Very Early
In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos.
RED: Reinforced Encoder-Decoder Networks for Action Anticipation
RED takes multiple history representations as input and learns to anticipate a sequence of future representations.
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video
Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action.
Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs
To this end, we propose a solution for the problem of pedestrian action anticipation at the point of crossing.