A large corpus of 81.1M English-language academic papers spanning many academic disciplines. Rich metadata, paper abstracts, resolved bibliographic references, as well as structured full text for 8.1M open access papers. Full text annotated with automatically-detected inline mentions of citations, figures, and tables, each linked to their corresponding paper objects. Aggregated papers from hundreds of academic publishers and digital archives into a unified source, and create the largest publicly-available collection of machine-readable academic text to date.
135 PAPERS • 2 BENCHMARKS
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study.
116 PAPERS • 1 BENCHMARK
ACL Anthology Reference Corpus (ACL ARC) is a collection of 10,920 academic papers from the ACL Anthology. ACL ARC is cleaned to remove:
12 PAPERS • 4 BENCHMARKS
A scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata.
8 PAPERS • NO BENCHMARKS YET
A data set containing citations, citation contexts, and papers.
6 PAPERS • 1 BENCHMARK
These images were generated using UnityEyes simulator, after including essential eyeball physiology elements and modeling binocular vision dynamics. The images are annotated with head pose and gaze direction information, besides 2D and 3D landmarks of eye's most important features. Additionally, the images are distributed into two classes denoting the status of the eye (Open for open eyes, Closed for closed eyes). This dataset was used to train a DNN model for detecting drowsiness status of a driver. The dataset contains 1,704 training images, 4,232 testing images and additional 4,103 images for improvements.
4 PAPERS • NO BENCHMARKS YET
FullTextPeerRead is a dataset created by Jeong et al. for context-aware citation recommendation. It contains context sentences to cited references and paper metadata, which makes it a well-organized dataset for a context-aware paper recommendation.
1 PAPER • 1 BENCHMARK
Internet Archive Scholar Reference Dataset.
1 PAPER • NO BENCHMARKS YET
A newly proposed dataset for local citation recommendation, consisting of 3.2 million local citation sentences along with the title and the abstract of both the citing and the cited papers. Around 1.66 million papers' titles and abstracts are available in the database.