The Sentences Involving Compositional Knowledge (SICK) dataset is a dataset for compositional distributional semantics. It includes a large number of sentence pairs that are rich in the lexical, syntactic and semantic phenomena. Each pair of sentences is annotated in two dimensions: relatedness and entailment. The relatedness score ranges from 1 to 5, and Pearson’s r is used for evaluation; the entailment relation is categorical, consisting of entailment, contradiction, and neutral. There are 4439 pairs in the train split, 495 in the trial split used for development and 4906 in the test split. The sentence pairs are generated from image and video caption datasets before being paired up using some algorithm.
322 PAPERS • 4 BENCHMARKS
STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.
44 PAPERS • 7 BENCHMARKS
The BIOSSES data set comprises total 100 sentence pairs all of which were selected from the "TAC2 Biomedical Summarization Track Training Data Set" .
35 PAPERS • 3 BENCHMARKS
Publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists.
19 PAPERS • NO BENCHMARKS YET
CHIP Semantic Textual Similarity, a dataset for sentence similarity in the non-i.i.d. (non-independent and identically distributed) setting, is used for the CHIP-STS task. Specifically, the task aims to transfer learning between disease types on Chinese disease questions and answer data. Given question pairs related to 5 different diseases (The disease types in the training and testing set are different), the task intends to determine whether the semantics of the two sentences are similar.
7 PAPERS • 1 BENCHMARK
A benchmark dataset with 960 pairs of Chinese wOrd Similarity, where all the words have two morphemes in three Part of Speech (POS) tags with their human annotated similarity rather than relatedness.
3 PAPERS • NO BENCHMARKS YET
SV-Ident comprises 4,248 sentences from social science publications in English and German. The data is the official data for the Shared Task: “Survey Variable Identification in Social Science Publications” (SV-Ident) 2022. Sentences are labeled with variables that are mentioned either explicitly or implicitly.
3 PAPERS • 2 BENCHMARKS
Spoken versions of the Semantic Textual Similarity dataset for testing semantic sentence level embeddings. Contains thousands of sentence pairs annotated by humans for semantic similarity. The spoken sentences can be used in sentence embedding models to test whether your model learns to capture sentence semantics. All sentences available in 6 synthetic Wavenet voices and a subset (5%) in 4 real voices recorded in a sound attenuated booth. Code to train a visually grounded spoken sentence embedding model and evaluation code is available at https://github.com/DannyMerkx/speech2image/tree/Interspeech21
This dataset contains information about Japanese word similarity including rare words. The dataset is constructed following the Stanford Rare Word Similarity Dataset. 10 annotators annotated word pairs with 11 levels of similarity.
2 PAPERS • NO BENCHMARKS YET
NSURL-2019 Shared Task 8: Semantic Question Similarity in Arabic
Includes co-referent name string pairs along with their similarities.
1 PAPER • NO BENCHMARKS YET
Phrase in Context is a curated benchmark for phrase understanding and semantic search, consisting of three tasks of increasing difficulty: Phrase Similarity (PS), Phrase Retrieval (PR) and Phrase Sense Disambiguation (PSD). The datasets are annotated by 13 linguistic experts on Upwork and verified by two groups: ~1000 AMT crowdworkers and another set of 5 linguistic experts. PiC benchmark is distributed under CC-BY-NC 4.0.