WLASL is a large video dataset for Word-Level American Sign Language (ASL) recognition, which features 2,000 common different words in ASL.
50 PAPERS • 3 BENCHMARKS
Over a period of three years (2009 - 2011) the daily news and weather forecast airings of the German public tv-station PHOENIX featuring sign language interpretation have been recorded and the weather forecasts of a subset of 386 editions have been transcribed using gloss notation. Furthermore, we used automatic speech recognition with manual cleaning to transcribe the original German speech. As such, this corpus allows to train end-to-end sign language translation systems from sign language video input to spoken language.
42 PAPERS • 2 BENCHMARKS
CSL-Daily (Chinese Sign Language Corpus) is a large-scale continuous SLT dataset. It provides both spoken language translations and gloss-level annotations. The topic revolves around people's daily lives (e.g., travel, shopping, medical care), the most likely SLT application scenario.
36 PAPERS • 4 BENCHMARKS
The How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation.
28 PAPERS • 3 BENCHMARKS
MS-ASL is a real-life large-scale sign language data set comprising over 25,000 annotated videos.
26 PAPERS • 1 BENCHMARK
The Ankara University Turkish Sign Language Dataset (AUTSL) is a large-scale, multimode dataset that contains isolated Turkish sign videos. It contains 226 signs that are performed by 43 different signers. There are 38,336 video samples in total. The samples are recorded using Microsoft Kinect v2 in RGB, depth and skeleton formats. The videos are provided at a resolution of 512×512. The skeleton data contains spatial coordinates, i.e. (x, y), of the 25 junction points on the signer body that are aligned with 512×512 data.
21 PAPERS • 1 BENCHMARK
The signing is recorded by a stationary color camera placed in front of the sign language interpreters. Interpreters wear dark clothes in front of an artificial grey background with color transition. All recorded videos are at 25 frames per second and the size of the frames is 210 by 260 pixels. Each frame shows the interpreter box only.
17 PAPERS • 1 BENCHMARK
BosphorusSign22k is a benchmark dataset for vision-based user-independent isolated Sign Language Recognition (SLR). The dataset is based on the BosphorusSign (Camgoz et al., 2016c) corpus which was collected with the purpose of helping both linguistic and computer science communities. It contains isolated videos of Turkish Sign Language glosses from three different domains: Health, finance and commonly used everyday signs. Videos in this dataset were performed by six native signers, which makes this dataset valuable for user independent sign language studies.
6 PAPERS • NO BENCHMARKS YET
Dataset Description The Greek Sign Language (GSL) is a large-scale RGB+D dataset, suitable for Sign Language Recognition (SLR) and Sign Language Translation (SLT). The video captures are conducted using an Intel RealSense D435 RGB+D camera at a rate of 30 fps. Both the RGB and the depth streams are acquired in the same spatial resolution of 848×480 pixels. To increase variability in the videos, the camera position and orientation is slightly altered within subsequent recordings. Seven different signers are employed to perform 5 individual and commonly met scenarios in different public services. The average length of each scenario is twenty sentences.
5 PAPERS • NO BENCHMARKS YET
Content4All is a collection of six open research datasets aimed at automatic sign language translation research.
3 PAPERS • NO BENCHMARKS YET
The sign database for the Argentinian Sign Language, created with the goal of producing a dictionary for LSA and training an automatic sign recognizer, includes 3200 videos where 10 non-expert subjects executed 5 repetitions of 64 different types of signs. Signs were selected among the most commonly used ones in the LSA lexicon, including both verbs and nouns.
3 PAPERS • 1 BENCHMARK
Rendered Handpose Dataset contains 41258 training and 2728 testing samples. Each sample provides:
2 PAPERS • NO BENCHMARKS YET
YouTube-ASL is a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset.
An artificial corpus built using grammatical dependencies rules due to the lack of resources for Sign Language.
1 PAPER • 1 BENCHMARK
Sign Language Datasets for French Belgian Sign Language This dataset is built upon the work of Belgian linguists from the University of Namur. During eight years, they've collected and annotated 50 hours of videos depicting sign language conversation. 100 signers were recorded, making it one of the most representative sign language corpus. The annotation has been sanitized and enriched with metadata to construct two, easy to use, datasets for sign language recognition. One for continuous sign language recognition and the other for isolated sign recognition.
1 PAPER • NO BENCHMARKS YET
We introduce a large-scale video dataset Slovo for Russian Sign Language task. Slovo dataset size is about 16 GB, and it contains 20400 RGB videos for 1000 sign language gestures from 194 singers. Each class has 20 samples. The dataset is divided into training set and test set by subject user_id. The training set includes 15300 videos, and the test set includes 5100 videos. The total video recording time is ~9.2 hours. About 35% of the videos are recorded in HD format, and 65% of the videos are in FullHD resolution. The average video length with gesture is 50 frames.
The ASL-Phono introduces a novel linguistics-based representation, which describes the signs in the ASLLVD dataset in terms of a set of attributes of the American Sign Language phonology.
0 PAPER • NO BENCHMARKS YET
The ASL-Skeleton3D introduces a representation based on mapping into the three-dimensional space the coordinates of the signers in the ASLLVD dataset. This enables a more accurate observation of the body parts and the signs articulation, allowing researchers to better understand the language and explore other approaches to the SLR field.
Extremely important: The ASLLVD video data are subject to Terms of Use: http://www.bu.edu/asllrp/signbank-terms.pdf. By downloading these video files, you are agreeing to respect these conditions. In particular, NO FURTHER REDISTRIBUTION OF THESE VIDEO FILES is allowed.