AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. Each of the video clips has been exhaustively annotated by human annotators, and together they represent a rich variety of scenes, recording conditions, and expressions of human activity. There are annotations for:
94 PAPERS • 7 BENCHMARKS
MPIIGaze is a dataset for appearance-based gaze estimation in the wild. It contains 213,659 images collected from 15 participants during natural everyday laptop use over more than three months. It has a large variability in appearance and illumination.
75 PAPERS • 1 BENCHMARK
Understanding where people are looking is an informative social cue. In this work, we present Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images. Our dataset consists of 238 subjects in indoor and outdoor environments with labelled 3D gaze across a wide range of head poses and distances. It is the largest publicly available dataset of its kind by both subject and variety, made possible by a simple and efficient collection method. Our proposed 3D gaze model extends existing models to include temporal information and to directly output an estimate of gaze uncertainty. We demonstrate the benefits of our model via an ablation study, and show its generalization performance via a cross-dataset evaluation against other recent gaze benchmark datasets. We furthermore propose a simple self-supervised approach to improve cross-dataset domain adaptation. Finally, we demonstrate an application of our model for estimating customer attention
51 PAPERS • 1 BENCHMARK
From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost $2.5M$ frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10 - 15fps) on a modern mobile device. Our model achieves a prediction error of 1.7cm and 2.5cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.3cm and 2.1cm. Fu
50 PAPERS • 1 BENCHMARK
The EYEDIAP dataset is a dataset for gaze estimation from remote RGB, and RGB-D (standard vision and depth), cameras. The recording methodology was designed by systematically including, and isolating, most of the variables which affect the remote gaze estimation algorithms:
47 PAPERS • 2 BENCHMARKS
Consists of over one million high-resolution images of varying gaze under extreme head poses. The dataset is collected from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets.
42 PAPERS • 1 BENCHMARK
Presents a diverse eye-gaze dataset.
26 PAPERS • 1 BENCHMARK
OpenEDS (Open Eye Dataset) is a large scale data set of eye-images captured using a virtual-reality (VR) head mounted display mounted with two synchronized eyefacing cameras at a frame rate of 200 Hz under controlled illumination. This dataset is compiled from video capture of the eye-region collected from 152 individual participants and is divided into four subsets: (i) 12,759 images with pixel-level annotations for key eye-regions: iris, pupil and sclera (ii) 252,690 unlabelled eye-images, (iii) 91,200 frames from randomly selected video sequence of 1.5 seconds in duration and (iv) 143 pairs of left and right point cloud data compiled from corneal topography of eye regions collected from a subset, 143 out of 152, participants in the study.
24 PAPERS • 1 BENCHMARK
A dataset for building models that detect people Looking At Each Other (LAEO) in video sequences.
5 PAPERS • NO BENCHMARKS YET
Dataset to address the problem of detecting people Looking At Each Other (LAEO) in video sequences.
4 PAPERS • NO BENCHMARKS YET
These images were generated using UnityEyes simulator, after including essential eyeball physiology elements and modeling binocular vision dynamics. The images are annotated with head pose and gaze direction information, besides 2D and 3D landmarks of eye's most important features. Additionally, the images are distributed into eight classes denoting the gaze direction of a driver's eyes (TopLeft, TopRight, TopCenter, MiddleLeft, MiddleRight, BottomLeft, BottomRight, BottomCenter). This dataset was used to train a DNN model for estimating the gaze direction. The dataset contains 61,063 training images, 132,630 testing images and additional 72,000 images for improvement.
3 PAPERS • NO BENCHMARKS YET
OpenEDS2020 is a dataset of eye-image sequences captured at a frame rate of 100 Hz under controlled illumination, using a virtual-reality head-mounted display mounted with two synchronized eye-facing cameras. The dataset, which is anonymized to remove any personally identifiable information on participants, consists of 80 participants of varied appearance performing several gaze-elicited tasks, and is divided in two subsets: 1) Gaze Prediction Dataset, with up to 66,560 sequences containing 550,400 eye-images and respective gaze vectors, created to foster research in spatio-temporal gaze estimation and prediction approaches; and 2) Eye Segmentation Dataset, consisting of 200 sequences sampled at 5 Hz, with up to 29,500 images, of which 5% contain a semantic segmentation label, devised to encourage the use of temporal information to propagate labels to contiguous frames.
2 PAPERS • NO BENCHMARKS YET
The EyeInfo Dataset is an open-source eye-tracking dataset created by Fabricio Batista Narcizo, a research scientist at the IT University of Copenhagen (ITU) and GN Audio A/S (Jabra), Denmark. This dataset was introduced in the paper "High-Accuracy Gaze Estimation for Interpolation-Based Eye-Tracking Methods" (DOI: 10.3390/vision5030041). The dataset contains high-speed monocular eye-tracking data from an off-the-shelf remote eye tracker using active illumination. The data from each user has a text file with data annotations of eye features, environment, viewed targets, and facial features. This dataset follows the principles of the General Data Protection Regulation (GDPR).
1 PAPER • NO BENCHMARKS YET
We introduce a new dataset of annotated surveillance videos of freely moving people taken from a distance in both indoor and outdoor scenes. The videos are captured with multiple cameras placed in eight different daily environments. People in the videos undergo large pose variations and are frequently occluded by various environmental factors. Most important, their eyes are mostly not clearly visible as is often the case in surveillance videos. We introduce the first rigorously annotated dataset of 3D gaze directions of freely moving people captured from afar.
LISA Gaze is a dataset for driver gaze estimation comprising of 11 long drives, driven by 10 subjects in two different cars.
This is a synthetic dataset containing full images (instead of only cropped faces) that provides ground truth 3D gaze directions for multiple people in one image.
1 PAPER • 1 BENCHMARK