The SNLI dataset (Stanford Natural Language Inference) consists of 570k sentence-pairs manually labeled as entailment, contradiction, and neutral. Premises are image captions from Flickr30k, while hypotheses were generated by crowd-sourced annotators who were shown a premise and asked to generate entailing, contradicting, and neutral sentences. Annotators were instructed to judge the relation between sentences given that they describe the same event. Each pair is labeled as “entailment”, “neutral”, “contradiction” or “-”, where “-” indicates that an agreement could not be reached.
Source: Breaking NLI Systemswith Sentences that Require Simple Lexical InferencesPaper | Code | Results | Date | Stars |
---|