SNLI (Stanford Natural Language Inference)

Introduced by Bowman et al. in A large annotated corpus for learning natural language inference

The SNLI dataset (Stanford Natural Language Inference) consists of 570k sentence-pairs manually labeled as entailment, contradiction, and neutral. Premises are image captions from Flickr30k, while hypotheses were generated by crowd-sourced annotators who were shown a premise and asked to generate entailing, contradicting, and neutral sentences. Annotators were instructed to judge the relation between sentences given that they describe the same event. Each pair is labeled as “entailment”, “neutral”, “contradiction” or “-”, where “-” indicates that an agreement could not be reached.

Source: Breaking NLI Systemswith Sentences that Require Simple Lexical Inferences

Homepage