Egocentric Activity Recognition
14 papers with code • 2 benchmarks • 4 datasets
Libraries
Use these libraries to find Egocentric Activity Recognition models and implementationsMost implemented papers
Long-Term Feature Banks for Detailed Video Understanding
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Large-scale weakly-supervised pre-training for video action recognition
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention
Our method is ranked first in the public leaderboard of the EPIC-Kitchens egocentric action anticipation challenge 2019.
Learning Video Representations from Large Language Models
We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations
Our dataset and experiments can be of interest to communities of 3D hand pose estimation, 6D object pose, and robotics as well as action recognition.
A Correlation Based Feature Representation for First-Person Activity Recognition
The per-frame (per-segment) extracted features are considered as a set of time series, and inter and intra-time series relations are employed to represent the video descriptors.
Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition
Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video.
LSTA: Long Short-Term Attention for Egocentric Action Recognition
Egocentric activity recognition is one of the most challenging tasks in video analysis.
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i. e. the combination of modalities within a range of temporal offsets.
Integrating Human Gaze into Attention for Egocentric Activity Recognition
In addition, we model the distribution of gaze fixations using a variational method.