The Infant Health and Development Program (IHDP) is a randomized controlled study designed to evaluate the effect of home visit from specialist doctors on the cognitive test scores of premature infants. The datasets is first used for benchmarking treatment effect estimation algorithms in Hill [35], where selection bias is induced by removing non-random subsets of the treated individuals to create an observational dataset, and the outcomes are generated using the original covariates and treatments. It contains 747 subjects and 25 variables.
165 PAPERS • 1 BENCHMARK
The Jobs dataset by LaLonde [36] is a widely used benchmark in the causal inference community, where the treatment is job training and the outcomes are income and employment status after training. The dataset includes 8 covariates such as age, education, and previous earnings. Our goal is to predict unemployment, using the feature set of Dehejia and Wahba [37]. Following Shalit et al. 8, we combined the LaLonde experimental sample (297 treated, 425 control) with the PSID comparison group (2490 control).
46 PAPERS • 1 BENCHMARK
A large-scale English paraphrase dataset that surpasses prior work in both quantity and quality.
17 PAPERS • NO BENCHMARKS YET
A human-curated ChineseReading Comprehension dataset on Opinion. The questions in ReCO are opinion based queries issued to the commercial search engine. The passages are provided by the crowdworkers who extract the support snippet from the retrieved documents.
9 PAPERS • NO BENCHMARKS YET
Taking Advice from ChatGPT is a laboratory study of how student participants incorporate advice generated by ChatGPT. In a survey conducted through the Experimental Social Science Laboratory, 118 students answered 2,828 questions on topics from the MMLU benchmark. The rich dataset includes questions/choices, advice characteristics, participant answers, and participant background. It can be used to explore algorithm aversion, advice-taking, ChatGPT usage, and more.
1 PAPER • NO BENCHMARKS YET
A maintained database tracks ICLR submissions and reviews, augmented with author profiles and higher-level textual features.
NAIST COVID is a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo. The data cover microblogs from January 20, 2020, to March 24, 2020.
The NTPairs dataset consists of the pairs of news articles and their corresponding tweets that were published by eight media outlets in 2018. The eight outlets were selected to consider diverse outlets, which employ a different editing style for news sharing, in terms of publishing channels and political leaning.