The CIFAR-10 dataset (Canadian Institute for Advanced Research, 10 classes) is a subset of the Tiny Images dataset and consists of 60000 32x32 color images. The images are labelled with one of 10 mutually exclusive classes: airplane, automobile (but not truck or pickup truck), bird, cat, deer, dog, frog, horse, ship, and truck (but not pickup truck). There are 6000 images per class with 5000 training and 1000 testing images per class.
14,087 PAPERS • 98 BENCHMARKS
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
6,980 PAPERS • 52 BENCHMARKS
The PubMed dataset consists of 19717 scientific publications from PubMed database pertaining to diabetes classified into one of three classes. The citation network consists of 44338 links. Each publication in the dataset is described by a TF/IDF weighted word vector from a dictionary which consists of 500 unique words.
1,063 PAPERS • 24 BENCHMARKS
The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner. OGB is a community-driven initiative in active development.
808 PAPERS • 16 BENCHMARKS
The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. The first 20 days are used for training and the remaining days for testing (with 30% used for validation). For features, off-the-shelf 300-dimensional GloVe CommonCrawl word vectors are used.
587 PAPERS • 13 BENCHMARKS
The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.
492 PAPERS • 20 BENCHMARKS
PROTEINS is a dataset of proteins that are classified as enzymes or non-enzymes. Nodes represent the amino acids and two nodes are connected by an edge if they are less than 6 Angstroms apart.
322 PAPERS • 1 BENCHMARK
The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.
314 PAPERS • 14 BENCHMARKS
IMDB-BINARY is a movie collaboration dataset that consists of the ego-networks of 1,000 actors/actresses who played roles in movies in IMDB. In each graph, nodes represent actors/actress, and there is an edge between them if they appear in the same movie. These graphs are derived from the Action and Romance genres.
283 PAPERS • 2 BENCHMARKS
In particular, MUTAG is a collection of nitroaromatic compounds and the goal is to predict their mutagenicity on Salmonella typhimurium. Input graphs are used to represent chemical compounds, where vertices stand for atoms and are labeled by the atom type (represented by one-hot encoding), while edges between vertices represent bonds between the corresponding atoms. It includes 188 samples of chemical compounds with 7 discrete node labels.
248 PAPERS • 3 BENCHMARKS
COLLAB is a scientific collaboration dataset. A graph corresponds to a researcher’s ego network, i.e., the researcher and its collaborators are nodes and an edge indicates collaboration between two researchers. A researcher’s ego network has three possible labels, i.e., High Energy Physics, Condensed Matter Physics, and Astro Physics, which are the fields that the researcher belongs to. The dataset has 5,000 graphs and each graph has label 0, 1, or 2.
230 PAPERS • 2 BENCHMARKS
The NCI1 dataset comes from the cheminformatics domain, where each input graph is used as representation of a chemical compound: each vertex stands for an atom of the molecule, and edges between vertices represent bonds between atoms. This dataset is relative to anti-cancer screens where the chemicals are assessed as positive or negative to cell lung cancer. Each vertex has an input label representing the corresponding atom type, encoded by a one-hot-encoding scheme into a vector of 0/1 elements.
229 PAPERS • 2 BENCHMARKS
IMDB-MULTI is a relational dataset that consists of a network of 1000 actors or actresses who played roles in movies in IMDB. A node represents an actor or actress, and an edge connects two nodes when they appear in the same movie. In IMDB-MULTI, the edges are collected from three different genres: Comedy, Romance and Sci-Fi.
226 PAPERS • 2 BENCHMARKS
ENZYMES is a dataset of 600 protein tertiary structures obtained from the BRENDA enzyme database. The ENZYMES dataset contains 6 enzymes.
173 PAPERS • 1 BENCHMARK
REDDIT-BINARY consists of graphs corresponding to online discussions on Reddit. In each graph, nodes represent users, and there is an edge between them if at least one of them respond to the other’s comment. There are four popular subreddits, namely, IAmA, AskReddit, TrollXChromosomes, and atheism. IAmA and AskReddit are two question/answer based subreddits, and TrollXChromosomes and atheism are two discussion-based subreddits. A graph is labeled according to whether it belongs to a question/answer-based community or a discussion-based community.
137 PAPERS • 2 BENCHMARKS
PTC is a collection of 344 chemical compounds represented as graphs which report the carcinogenicity for rats. There are 19 node labels for each node.
101 PAPERS • 1 BENCHMARK
Reddit-5K is a relational dataset extracted from Reddit.
74 PAPERS • 1 BENCHMARK
Tudataset: A collection of benchmark datasets for learning with graphs
68 PAPERS • 1 BENCHMARK
AIDS is a graph dataset. It consists of 2000 graphs representing molecular compounds which are constructed from the AIDS Antiviral Screen Database of Active Compounds. It contains 4395 chemical compounds, of which 423 belong to class CA, 1081 to CM, and the remaining compounds to CI.
52 PAPERS • 1 BENCHMARK
The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs.
41 PAPERS • 5 BENCHMARKS
It's a synthetic dataset, which contains 1000 graphs divided into two classes according to the motif they contain: either a “house” or a five-node cycle.
39 PAPERS • 1 BENCHMARK
CSL is a synthetic dataset introduced in Murphy et al. (2019) to test the expressivity of GNNs. In particular, graphs are isomorphic if they have the same degree and the task is to classify non-isomorphic graphs.
29 PAPERS • 2 BENCHMARKS
The Tox21 data set comprises 12,060 training samples and 647 test samples that represent chemical compounds. There are 801 "dense features" that represent chemical descriptors, such as molecular weight, solubility or surface area, and 272,776 "sparse features" that represent chemical substructures (ECFP10, DFS6, DFS8; stored in Matrix Market Format ). Machine learning methods can either use sparse or dense data or combine them. For each sample there are 12 binary labels that represent the outcome (active/inactive) of 12 different toxicological experiments. Note that the label matrix contains many missing values (NAs). The original data source and Tox21 challenge site is https://tripod.nih.gov/tox21/challenge/.
25 PAPERS • 5 BENCHMARKS
Reddit12k contains 11929 graphs each corresponding to an online discussion thread where nodes represent users, and an edge represents the fact that one of the two users responded to the comment of the other user. There is 1 of 11 graph labels associated with each of these 11929 discussion graphs, representing the category of the community.
24 PAPERS • NO BENCHMARKS YET
The DIGITS dataset consists of 1797 8×8 grayscale images (1439 for training and 360 for testing) of handwritten digits.
18 PAPERS • 2 BENCHMARKS
The BACE dataset focuses on inhibitors of human beta-secretase 1 (BACE-1). It includes both quantitative (IC50 values) and qualitative (binary labels) binding results. The dataset comprises small molecule inhibitors across a wide range of affinities, spanning three orders of magnitude (from nanomolar to micromolar IC50 values). Specifically, it provides: 154 BACE inhibitors for affinity prediction. 20 BACE inhibitors for pose prediction. 34 BACE inhibitors for free energy prediction.
17 PAPERS • 4 BENCHMARKS
The BBBP dataset comes from a study focused on modeling and predicting the permeability of the blood-brain barrier. The BBBP dataset contains binary labels indicating whether a compound can penetrate the blood-brain barrier (BBB) or not. Researchers use this dataset to develop and evaluate machine learning methods for predicting BBB permeability. It’s a critical task because understanding which compounds can cross the BBB is essential for drug discovery and designing therapeutics for neurological conditions.
17 PAPERS • 5 BENCHMARKS
SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts. The available information include side effect frequency, drug and side effect classifications as well as links to further information, for example drug–target relations.
16 PAPERS • 4 BENCHMARKS
The HIV dataset was introduced by the Drug Therapeutics Program (DTP) AIDS Antiviral Screen, which tested the ability to inhibit HIV replication for over 40,000 compounds. Screening results were evaluated and placed into three categories: confirmed inactive (CI), confirmed active (CA), and confirmed moderately active (CM).
15 PAPERS • 5 BENCHMARKS
The ClinTox dataset compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons. The dataset includes two classification tasks for 1491 drug compounds with known chemical structures: (1) clinical trial toxicity (or absence of toxicity) and (2) FDA approval status.
14 PAPERS • 3 BENCHMARKS
The LINUX dataset consists of 48,747 Program Dependence Graphs (PDG) generated from the Linux kernel. Each graph represents a function, where a node represents one statement and an edge represents the dependency between the two statements
13 PAPERS • NO BENCHMARKS YET
MalNet is a large public graph database, representing a large-scale ontology of software function call graphs. MalNet contains over 1.2 million graphs, averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families.
13 PAPERS • 4 BENCHMARKS
ToxCast is an initiative by the U.S. Environmental Protection Agency (EPA) aimed at predicting the potential toxicity of various chemical compounds. It involves high-throughput screening assays that evaluate thousands of chemicals across multiple biological endpoints. These endpoints cover a wide range of effects, including cell cycle disruptions, interactions with steroid receptors, and cytotoxicity.
12 PAPERS • 4 BENCHMARKS
The Maximum Unbiased Validation (MUV) dataset is a benchmark dataset selected from PubChem BioAssay. It was created by applying a refined nearest-neighbor analysis. The MUV dataset is specifically designed for the validation of virtual screening techniques.
9 PAPERS • 3 BENCHMARKS
Mutagenicity is a chemical compound dataset of drugs, which can be categorized into two classes: mutagen and non-mutagen.
9 PAPERS • 1 BENCHMARK
For benchmarking, please refer to its variant UPFD-POL and UPFD-GOS.
7 PAPERS • 2 BENCHMARKS
These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
7 PAPERS • 5 BENCHMARKS
The Gossipcop variant of the UPFD dataset for benchmarking.
3 PAPERS • 1 BENCHMARK
2 PAPERS • 1 BENCHMARK
The PolitiFact variant of the UPFD dataset for benchmarking.
Synthetic graph classification datasets with the task of recognizing the connectivity of same-colored nodes in 4 graphs of varying topology.
1 PAPER • NO BENCHMARKS YET
The sampled 2-hop subgraphs centered on Exchange accounts on the Ethereum Interaction graph.
The sampled 2-hop subgraphs centered on ICO-wallet accounts on the Ethereum Interaction graph.
The sampled 2-hop subgraphs centered on Mining accounts on the Ethereum Interaction graph.
This dataset accompanies the paper `Learning the mechanisms of network growth' by the same authors. The dataset contains 6733 networks of size 20,000 each generated in accordance to different combination of three mechanisms: fitness, aging and preferential attachment. The goal is to use machine learning to identify the combination of mechanisms that was used to create the network. The dataset includes static features from the literature and two version of our newly developed dynamic features. net
1 PAPER • 1 BENCHMARK
The AIDS Antiviral Screen dataset is a dataset of screens checking tens of thousands of compounds for evidence of anti-HIV activity. The available screen results are chemical graph-structured data of these various compounds.
0 PAPER • NO BENCHMARKS YET