The English Penn Treebank (PTB) corpus, and in particular the section of the corpus corresponding to the articles of Wall Street Journal (WSJ), is one of the most known and used corpus for the evaluation of models for sequence labelling. The task consists of annotating each word with its Part-of-Speech tag. In the most common split of this corpus, sections from 0 to 18 are used for training (38 219 sentences, 912 344 tokens), sections from 19 to 21 are used for validation (5 527 sentences, 131 768 tokens), and sections from 22 to 24 are used for testing (5 462 sentences, 129 654 tokens). The corpus is also commonly used for character-level and word-level Language Modelling.
977 PAPERS • 10 BENCHMARKS
The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at nytimes.com. The corpus includes:
264 PAPERS • 7 BENCHMARKS
QA-SRL was proposed as an open schema for semantic roles, in which the relation between an argument and a predicate is expressed as a natural-language question containing the predicate (“Where was someone educated?”) whose answer is the argument (“Princeton”). The authors collected about 19,000 question-answer pairs from 3,200 sentences.
41 PAPERS • NO BENCHMARKS YET
CaRB [Bhardwaj et al., 2019] is developed by re-annotating the dev and test splits of OIE2016 via crowd-sourcing. Besides improving annotation quality, CaRB also provides a new matching scorer. CaRB scorer uses token level match and it matches relation with relation, arguments with arguments.
34 PAPERS • 1 BENCHMARK
OIE2016 is the first large-scale OpenIE benchmark. It is created by automatic conversion from QA-SRL [He et al., 2015], a semantic role labeling dataset. The sentences are from news (e.g., WSJ) and encyclopedia (e.g., WIKI) domains. Since there are no restrictions on the elements of OpenIE extractions, partial-matching criteria instead of exact-matching is typically used. Hence, the evaluation script can tolerate the extractions that are slightly different from the gold annotation.
30 PAPERS • 1 BENCHMARK
The GenericsKB contains 3.4M+ generic sentences about the world, i.e., sentences expressing general truths such as "Dogs bark," and "Trees remove carbon dioxide from the atmosphere." Generics are potentially useful as a knowledge source for AI systems requiring general world knowledge. The GenericsKB is the first large-scale resource containing naturally occurring generic sentences (as opposed to extracted or crowdsourced triples), and is rich in high-quality, general, semantically complete statements. Generics were primarily extracted from three large text sources, namely the Waterloo Corpus, selected parts of Simple Wikipedia, and the ARC Corpus. A filtered, high-quality subset is also available in GenericsKB-Best, containing 1,020,868 sentences.
24 PAPERS • NO BENCHMARKS YET
LSOIE is a large-scale OpenIE data converted from QA-SRL 2.0 in two domains, i.e., Wikipedia and Science. It is 20 times larger than the next largest human-annotated OpenIE data, and thus is reliable for fair evaluation. LSOIE provides n-ary OpenIE annotations and gold tuples are in the 〈ARG0, Relation, ARG1, . . . , ARGn〉 format. The dataset has two subsets ... namely LSOIE-wiki and LSOIE-sci, for comprehensive evaluation. LSOIE-wiki has 24,251 sentences and LSOIE-sci has 47,919 sentences.
8 PAPERS • 2 BENCHMARKS
OPIEC is an Open Information Extraction (OIE) corpus, constructed from the entire English Wikipedia. It containing more than 341M triples. Each triple from the corpus is composed of rich meta-data: each token from the subj / obj / rel along with NLP annotations (POS tag, NER tag, ...), provenance sentence (along with its dependency parse, sentence order relative to the article), original (golden) links contained in the Wikipedia articles, space / time.
8 PAPERS • NO BENCHMARKS YET
We manually performed the task of Open Information Extraction on 5 short documents, elaborating tentative guidelines for the task, and resulting in a ground truth reference of 347 tuples. [section 1]
8 PAPERS • 1 BENCHMARK
This dataset is a new knowledge-base (KB) of hasPart relationships, extracted from a large corpus of generic statements. Complementary to other resources available, it is the first which is all three of: accurate (90% precision), salient (covers relationships a person may mention), and has high coverage of common terms (approximated as within a 10 year old’s vocabulary), as well as having several times more hasPart entries than in the popular ontologies ConceptNet and WordNet. In addition, it contains information about quantifiers, argument modifiers, and links the entities to appropriate concepts in Wikipedia and WordNet.
7 PAPERS • NO BENCHMARKS YET
BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese and German. In contrast to existing OIE benchmarks, BenchIE takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all surface forms of the same fact.
3 PAPERS • 1 BENCHMARK
We manually annotate 800 sentences from 80 documents in two domains (Healthcare and Transportation) to form a DocOIE dataset for evaluation.
1 PAPER • 2 BENCHMARKS
The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in “Answering Complex Questions Using Open Information Extraction” (referred as Tuple KB, T). These sentences were collected from a large Web corpus using training questions from 4th and 8th grade as queries. This dataset contains 156K sentences collected for 4th grade questions and 107K sentences for 8th grade questions. Each sentence is followed by the Open IE v4 tuples using their simple format.
1 PAPER • NO BENCHMARKS YET