The BotNet dataset is a set of topological botnet detection datasets forgraph neural networks.
12 PAPERS • NO BENCHMARKS YET
Mindboggle is a large publicly available dataset of manually labeled brain MRI. It consists of 101 subjects collected from different sites, with cortical meshes varying from 102K to 185K vertices. Each brain surface contains 25 or 31 manually labeled parcels.
6 PAPERS • NO BENCHMARKS YET
Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine. In this light, QA on Electronic Health Records (EHR), namely EHR QA, can work as a crucial milestone toward developing an intelligent agent in healthcare. EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA.
2 PAPERS • NO BENCHMARKS YET
The ETHZ Shape dataset contains images of five diverse shape-based classes, collected from Flickr and Google Images. The main challenges it offers are clutter, intra-class shape variability, and scale changes. The authors deliberately selected several images where the object comprises only a rather small portion of the image, and made an effort to include objects appearing at a wide range of scales. The objects are mostly unoccluded and are all taken from approximately the same viewpoint (the side).
1 PAPER • NO BENCHMARKS YET
This dataset contains simulated synthetic particle decays, simulated using the PhaseSpace library. All simulated decay topologies have a common root particle of mass 100 (arbitrary units). Intermediate particles are selected at random with replacement from the following masses: [90, 80, 70, 50, 25, 20, 10]. Final state particles, which make up the leaf nodes of generated topologies, are drawn with replacement from the following masses: [1, 2, 3, 5, 12]. For each intermediate particle (including the root), we limit the minimum number of children to two, and the maximum five. The dataset contains the resulting simulated particle physics decays, with information about the detected particle (leaves) to be used as input, and Lowest Common Ancestor Generations (LCAGs) to be used as training targets.
The dataset covers the 2022-23 NBA regular season (2022-10-18 to 2023-01-20) which contains 691 games in 92 game days. There are 582 active players among the 30 teams. Besides 7 basic statistics, we collected 3 tracking statistics, and 3 advanced statistics. We use tracking statistics to more accurately reflect players' movements on the court, and advanced statistics to more properly represent a player's effectiveness and contribution to the game. Together, these two types of data give us a better understanding of factors that are not visible on the scoreboard.
OCB contains two graph datasets, Ckt-Bench-101 and Ckt-Bench-301, for representation learning over analog circuits. Ckt-Bench-101 and Ckt-Bench-301 contain graphs (DAGs) that represent analog circuits and provide their corresponding graph-level properties: DC gain (Gain), bandwidth (BW), phase margin (PM),Figure of Merit (FoM), which characterize the circuit performance.
The analysis of building models for usable area, building safety, and energy efficiency requires accurate classification data of spaces and space elements. To reduce input model preparation effort and errors, automated classification of spaces and space elements is desirable. Although existing space function classifiers use space adjacency or connectivity graphs as input, the application of Graph Deep Learning (GDL) to space layout element classification has not been extensively researched due to the lack of suitable datasets. To bridge this gap, we introduce a dataset named SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings designed or built between 1952 and 2019 in 13 countries. Each access graph contains nodes representing spaces and space elements and edges representing the connection between them. Nodes are uniquely identified and characterized by 16 features including “Position X”, “Position Y”, “Posit