The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.
1,577 PAPERS • 11 BENCHMARKS
The MPQA Opinion Corpus contains 535 news articles from a wide variety of news sources manually annotated for opinions and other private states (i.e., beliefs, emotions, sentiments, speculations, etc.).
301 PAPERS • 3 BENCHMARKS
For understanding multimodal language used in expressing humor.
33 PAPERS • NO BENCHMARKS YET
The Norwegian Review Corpus (NoReC) was created for the purpose of training and evaluating models for document-level sentiment analysis. More than 43,000 full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each review is labeled with a manually assigned score of 1–6, as provided by the rating of the original author.
11 PAPERS • NO BENCHMARKS YET
XED is a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages.
3 PAPERS • NO BENCHMARKS YET
Youtbean is a dataset created from closed captions of YouTube product review videos. It can be used for aspect extraction and sentiment classification.
FinnSentiment introduces a 27,000 sentence dataset (in Finnish) annotated independently with sentiment polarity by three native annotators.
2 PAPERS • NO BENCHMARKS YET
Conversational Stance Detection (CSD) is a dataset with annotations of stances and the structures of conversation threads. It consists of 500 conversation threads (including 500 posts and 5376 comments) from six major social media platforms in Hong Kong.
1 PAPER • NO BENCHMARKS YET
This dataset was used in the paper 'Template-based Abstractive Microblog Opinion Summarisation' (to be published at TACL, 2022). The data is structured as follows: each file represents a cluster of tweets which contains the tweet IDs and a summary of the tweets written by journalists. The gold standard summary follows a template structure and depending on its opinion content, it contains a main story, majority opinion (if any) and/or minority opinions (if any).
UIT-ViSFD is a Vietnamese Smartphone Feedback Dataset as a new benchmark corpus built based on strict annotation schemes for evaluating aspect-based sentiment analysis, consisting of 11,122 human-annotated comments for mobile e-commerce, which is freely available for research purposes.