WLASL is a large video dataset for Word-Level American Sign Language (ASL) recognition, which features 2,000 common different words in ASL.
50 PAPERS • 3 BENCHMARKS
Over a period of three years (2009 - 2011) the daily news and weather forecast airings of the German public tv-station PHOENIX featuring sign language interpretation have been recorded and the weather forecasts of a subset of 386 editions have been transcribed using gloss notation. Furthermore, we used automatic speech recognition with manual cleaning to transcribe the original German speech. As such, this corpus allows to train end-to-end sign language translation systems from sign language video input to spoken language.
42 PAPERS • 2 BENCHMARKS
CSL-Daily (Chinese Sign Language Corpus) is a large-scale continuous SLT dataset. It provides both spoken language translations and gloss-level annotations. The topic revolves around people's daily lives (e.g., travel, shopping, medical care), the most likely SLT application scenario.
36 PAPERS • 4 BENCHMARKS
The How2Sign is a multimodal and multiview continuous American Sign Language (ASL) dataset consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. A three-hour subset was further recorded in the Panoptic studio enabling detailed 3D pose estimation.
28 PAPERS • 3 BENCHMARKS
MS-ASL is a real-life large-scale sign language data set comprising over 25,000 annotated videos.
26 PAPERS • 1 BENCHMARK
Large-scale American Sign Language (ASL) - English dataset collected from online video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in multiple domains from over 200 signers.
9 PAPERS • NO BENCHMARKS YET
BosphorusSign22k is a benchmark dataset for vision-based user-independent isolated Sign Language Recognition (SLR). The dataset is based on the BosphorusSign (Camgoz et al., 2016c) corpus which was collected with the purpose of helping both linguistic and computer science communities. It contains isolated videos of Turkish Sign Language glosses from three different domains: Health, finance and commonly used everyday signs. Videos in this dataset were performed by six native signers, which makes this dataset valuable for user independent sign language studies.
6 PAPERS • NO BENCHMARKS YET
Content4All is a collection of six open research datasets aimed at automatic sign language translation research.
3 PAPERS • NO BENCHMARKS YET
An artificial corpus built using grammatical dependencies rules due to the lack of resources for Sign Language.
1 PAPER • 1 BENCHMARK
Sign languages are the primary means of communication for a large number of people worldwide. Recently, the availability of Sign language translation datasets have facilitated the incorporation of Sign language research in the NLP community. Though a wide variety of research focuses on improving translation systems for sign language, the lack of ample annotated resources hinders research in the data driven natural language processing community. In this resource paper, we introduce ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL), consisting of 30k ISL-English sentence pairs. To the best of our knowledge, it is the first and largest translation dataset for continuous Indian Sign Language with corresponding English transcripts. We provide a detailed analysis of the dataset and examine the distribution of words and phrases covered in the proposed dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, w
1 PAPER • NO BENCHMARKS YET
LSA-T is the first continuous Argentinian Sign Language (LSA) dataset. It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer. Videos are in 30 FPS full HD (1920x1080).
Sign Language Datasets for French Belgian Sign Language This dataset is built upon the work of Belgian linguists from the University of Namur. During eight years, they've collected and annotated 50 hours of videos depicting sign language conversation. 100 signers were recorded, making it one of the most representative sign language corpus. The annotation has been sanitized and enriched with metadata to construct two, easy to use, datasets for sign language recognition. One for continuous sign language recognition and the other for isolated sign recognition.
We provide separate training, development and test data. The training data is available right away. The development and test data will be released in several stages, starting with a release of the development sources only.
1 PAPER • 2 BENCHMARKS