Kvasir-SEG is an open-access dataset of gastrointestinal polyp images and corresponding segmentation masks, manually annotated by a medical doctor and then verified by an experienced gastroenterologist.
140 PAPERS • 3 BENCHMARKS
The BLUE benchmark consists of five different biomedicine text-mining tasks with ten corpora. These tasks cover a diverse range of text genres (biomedical literature and clinical notes), dataset sizes, and degrees of difficulty and, more importantly, highlight common biomedicine text-mining challenges.
122 PAPERS • NO BENCHMARKS YET
The goal of the Automated Cardiac Diagnosis Challenge (ACDC) challenge is to:
39 PAPERS • 5 BENCHMARKS
BioGRID is a biomedical interaction repository with data compiled through comprehensive curation efforts. The current index is version 4.2.192 and searches 75,868 publications for 1,997,840 protein and genetic interactions, 29,093 chemical interactions and 959,750 post translational modifications from major model organism species.
33 PAPERS • 2 BENCHMARKS
BLURB is a collection of resources for biomedical natural language processing. In general domains such as newswire and the Web, comprehensive benchmarks and leaderboards such as GLUE have greatly accelerated progress in open-domain NLP. In biomedicine, however, such resources are ostensibly scarce. In the past, there have been a plethora of shared tasks in biomedical NLP, such as BioCreative, BioNLP Shared Tasks, SemEval, and BioASQ, to name just a few. These efforts have played a significant role in fueling interest and progress by the research community, but they typically focus on individual tasks. The advent of neural language models such as BERTs provides a unifying foundation to leverage transfer learning from unlabeled text to support a wide range of NLP applications. To accelerate progress in biomedical pretraining strategies and task-specific methods, it is thus imperative to create a broad-coverage benchmark encompassing diverse biomedical tasks.
32 PAPERS • 2 BENCHMARKS
Contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it a necessary resource to develop and evaluate tools to aid in the treatment of COVID-19.
31 PAPERS • NO BENCHMARKS YET
ATOM3D is a unified collection of datasets concerning the three-dimensional structure of biomolecules, including proteins, small molecules, and nucleic acids. These datasets are specifically designed to provide a benchmark for machine learning methods which operate on 3D molecular structure, and represent a variety of important structural, functional, and engineering tasks. All datasets are provided in a standardized format along with a Python package containing processing code, utilities, models, and dataloaders for common machine learning frameworks such as PyTorch. ATOM3D is designed to be a living database, where datasets are updated and tasks are added as the field progresses.
22 PAPERS • NO BENCHMARKS YET
Dataset contains 33,010 molecule-description pairs split into 80\%/10\%/10\% train/val/test splits. The goal of the task is to retrieve the relevant molecule for a natural language description. It is defined as follows:
22 PAPERS • 4 BENCHMARKS
The MS-CXR dataset provides 1162 image–sentence pairs of bounding boxes and corresponding phrases, collected across eight different cardiopulmonary radiological findings, with an approximately equal number of pairs for each finding. This dataset complements the existing MIMIC-CXR v.2 dataset and comprises: 1. Reviewed and edited bounding boxes and phrases (1026 pairs of bounding box/sentence); and 2. Manual bounding box labels from scratch (136 pairs of bounding box/sentence).e
ChemProt consists of 1,820 PubMed abstracts with chemical-protein interactions annotated by domain experts and was used in the BioCreative VI text mining chemical-protein interactions shared task.
16 PAPERS • 1 BENCHMARK
The National Institutes of Health’s Clinical Center has made a large-scale dataset of CT images publicly available to help the scientific community improve detection accuracy of lesions. While most publicly available medical image datasets have less than a thousand lesions, this dataset, named DeepLesion, has over 32,000 annotated lesions (220GB) identified on CT images. DeepLesion, a dataset with 32,735 lesions in 32,120 CT slices from 10,594 studies of 4,427 unique patients. There are a variety of lesion types in this dataset, such as lung nodules, liver tumors, enlarged lymph nodes, and so on. It has the potential to be used in various medical image applications
SICAPv2 is a database containing prostate histology whole slide images with both annotations of global Gleason scores and path-level Gleason grades.
15 PAPERS • NO BENCHMARKS YET
Indian Diabetic Retinopathy Image Dataset (IDRiD) dataset consists of typical diabetic retinopathy lesions and normal retinal structures annotated at a pixel level. This dataset also provides information on the disease severity of diabetic retinopathy and diabetic macular edema for each image. This dataset is perfect for the development and evaluation of image analysis algorithms for early detection of diabetic retinopathy.
14 PAPERS • 3 BENCHMARKS
The LIVECell (Label-free In Vitro image Examples of Cells) dataset is a large-scale microscopic image dataset for instance-segmentation of individual cells in 2D cell cultures.
14 PAPERS • 1 BENCHMARK
BioLAMA is a benchmark comprised of 49K biomedical factual knowledge triples for probing biomedical Language Models. It is used to assess the capabilities of Language Models for being valid biomedical knowledge bases.
13 PAPERS • 1 BENCHMARK
Consists of annotated frames containing GI procedure tools such as snares, balloons and biopsy forceps, etc. Beside of the images, the dataset includes ground truth masks and bounding boxes and has been verified by two expert GI endoscopists.
13 PAPERS • 3 BENCHMARKS
The Respiratory Sound database was originally compiled to support the scientific challenge organized at Int. Conf. on Biomedical Health Informatics - ICBHI 2017.
11 PAPERS • 1 BENCHMARK
The evaluation of human epidermal growth factor receptor 2 (HER2) expression is essential to formulate a precise treatment for breast cancer. The routine evaluation of HER2 is conducted with immunohistochemical techniques (IHC), which is very expensive. Therefore, we propose a breast cancer immunohistochemical (BCI) benchmark attempting to synthesize IHC data directly with the paired hematoxylin and eosin (HE) stained images. The dataset contains 4870 registered image pairs, covering a variety of HER2 expression levels (0, 1+, 2+, 3+).
10 PAPERS • 1 BENCHMARK
HyperKvasir dataset contains 110,079 images and 374 videos where it captures anatomical landmarks and pathological and normal findings. A total of around 1 million images and video frames altogether.
10 PAPERS • 2 BENCHMARKS
Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people.
CHAOS challenge aims the segmentation of abdominal organs (liver, kidneys and spleen) from CT and MRI data. ONsite section of the CHAOS was held in The IEEE International Symposium on Biomedical Imaging (ISBI) on April 11, 2019, Venice, ITALY. Online submissions are still welcome!
8 PAPERS • NO BENCHMARKS YET
Prediction of Finger Flexion IV Brain-Computer Interface Data Competition The goal of this dataset is to predict the flexion of individual fingers from signals recorded from the surface of the brain (electrocorticography (ECoG)). This data set contains brain signals from three subjects, as well as the time courses of the flexion of each of five fingers. The task in this competition is to use the provided flexion information in order to predict finger flexion for a provided test set. The performance of the classifier will be evaluated by calculating the average correlation coefficient r between actual and predicted finger flexion.
7 PAPERS • 1 BENCHMARK
The NuCLS dataset contains over 220,000 labeled nuclei from breast cancer images from TCGA. These nuclei were annotated through the collaborative effort of pathologists, pathology residents, and medical students using the Digital Slide Archive. These data can be used in several ways to develop and validate algorithms for nuclear detection, classification, and segmentation, or as a resource to develop and evaluate methods for interrater analysis.
7 PAPERS • NO BENCHMARKS YET
UI-PRMD is a data set of movements related to common exercises performed by patients in physical therapy and rehabilitation programs. The data set consists of 10 rehabilitation exercises. A sample of 10 healthy individuals repeated each exercise 10 times in front of two sensory systems for motion capturing: a Vicon optical tracker, and a Kinect camera. The data is presented as positions and angles of the body joints in the skeletal models provided by the Vicon and Kinect mocap systems.
7 PAPERS • 2 BENCHMARKS
Accurate lesion segmentation is critical in stroke rehabilitation research for the quantification of lesion burden and accurate image processing. Current automated lesion segmentation methods for T1-weighted (T1w) MRIs, commonly used in rehabilitation research, lack accuracy and reliability. Manual segmentation remains the gold standard, but it is time-consuming, subjective, and requires significant neuroanatomical expertise. However, many methods developed with ATLAS v1.2 report low accuracy, are not publicly accessible or are improperly validated, limiting their utility to the field. Here we present ATLAS v2.0 (N=1271), a larger dataset of T1w stroke MRIs and manually segmented lesion masks that includes training (public. n=655), test (masks hidden, n=300), and generalizability (completely hidden, n=316) data. Algorithm development using this larger sample should lead to more robust solutions, and the hidden test and generalizability datasets allow for unbiased performance evaluation
6 PAPERS • 1 BENCHMARK
EHR-RelB is a benchmark dataset for biomedical concept relatedness, consisting of 3630 concept pairs sampled from electronic health records (EHRs). EHR-RelA is a smaller dataset of 111 concept pairs, which are mainly unrelated.
6 PAPERS • NO BENCHMARKS YET
This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual correction(s). We also propose first baseline models to automatically process this MCQA task in order to report on the current performances and to highlight the difficulty of the task. A detailed analysis of the results showed that it is necessary to have representations adapted to the medical domain or to the MCQA task: in our case, English specialized models yielded better results than generic French ones, even though FrenchMedMCQA is in French. Corpus, models and tools are available online.
The complete blood count (CBC) dataset contains 360 blood smear images along with their annotation files splitting into Training, Testing, and Validation sets. The training folder contains 300 images with annotations. The testing and validation folder both contain 60 images with annotations. We have done some modifications over the original dataset to prepare this CBC dataset where some of the image annotation files contain very low red blood cells (RBCs) than actual and one annotation file does not include any RBC at all although the cell smear image contains RBCs. So, we clear up all the fallacious files and split the dataset into three parts. Among the 360 smear images, 300 blood cell images with annotations are used as the training set first, and then the rest of the 60 images with annotations are used as the testing set. Due to the shortage of data, a subset of the training set is used to prepare the validation set which contains 60 images with annotations.
5 PAPERS • NO BENCHMARKS YET
Data Description The training data contains twelve-lead ECGs. The validation and test data contains twelve-lead, six-lead, four-lead, three-lead, and two-lead ECGs:
5 PAPERS • 2 BENCHMARKS
The 2017 PhysioNet/CinC Challenge aims to encourage the development of algorithms to classify, from a single short ECG lead recording (between 30 s and 60 s in length), whether the recording shows normal sinus rhythm, atrial fibrillation (AF), an alternative rhythm, or is too noisy to be classified.
4 PAPERS • NO BENCHMARKS YET
MICCAI Challenge on Circuit Reconstruction from Electron Microscopy Images.
4 PAPERS • 1 BENCHMARK
The Kvasir-SEG dataset includes 196 polyps smaller than 10 mm classified as Paris class 1 sessile or Paris class IIa. We have selected it with the help of expert gastroenterologists. We have released this dataset separately as a subset of Kvasir-SEG. We call this subset Kvasir-Sessile.
The LIMUC dataset is the largest publicly available labeled ulcerative colitis dataset that compromises 11276 images from 564 patients and 1043 colonoscopy procedures. Three experienced gastroenterologists were involved in the annotation process, and all images are labeled according to the Mayo endoscopic score (MES).
Abstract Lobachevsky University Electrocardiography Database (LUDB) is an ECG signal database with marked boundaries and peaks of P, T waves and QRS complexes. The database consists of 200 10-second 12-lead ECG signal records representing different morphologies of the ECG signal. The ECGs were collected from healthy volunteers and patients of the Nizhny Novgorod City Hospital No 5 in 2017–2018. The patients had various cardiovascular diseases while some of them had pacemakers. The boundaries of P, T waves and QRS complexes were manually annotated by cardiologists for all 200 records. Also, each record is annotated with the corresponding diagnosis. The database can be used for educational purposes as well as for training and testing algorithms for ECG delineation, i.e. for automatic detection of boundaries and peaks of P, T waves and QRS complexes.
Plain Language Adaptation of Biomedical Abstracts (PLABA) is a dataset designed for automatic adaptation that is both document- and sentence-aligned. The dataset contains 750 adapted abstracts, totaling 7643 sentence pairs.
Data The data for this Challenge are from multiple sources: CPSC Database and CPSC-Extra Database INCART Database PTB and PTB-XL Database The Georgia 12-lead ECG Challenge (G12EC) Database Undisclosed Database The first source is the public (CPSC Database) and unused data (CPSC-Extra Database) from the China Physiological Signal Challenge in 2018 (CPSC2018), held during the 7th International Conference on Biomedical Engineering and Biotechnology in Nanjing, China. The unused data from the CPSC2018 is NOT the test data from the CPSC2018. The test data of the CPSC2018 is included in the final private database that has been sequestered. This training set consists of two sets of 6,877 (male: 3,699; female: 3,178) and 3,453 (male: 1,843; female: 1,610) of 12-ECG recordings lasting from 6 seconds to 60 seconds. Each recording was sampled at 500 Hz.
This data collection consists of images acquired during chemoradiotherapy of 20 locally-advanced, non-small cell lung cancer patients. The images include four-dimensional (4D) fan beam (4D-FBCT) and 4D cone beam CT (4D-CBCT). All patients underwent concurrent radiochemotherapy to a total dose of 64.8-70 Gy using daily 1.8 or 2 Gy fractions. scription of the dataset.
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
4 PAPERS • 2 BENCHMARKS
CoVERT is a fact-checked corpus of tweets with a focus on the domain of biomedicine and COVID-19-related (mis)information. The corpus consists of 300 tweets, each annotated with medical named entities and relations. Employs a novel crowdsourcing methodology to annotate all tweets with fact-checking labels and supporting evidence, which crowdworkers search for online. This methodology results in moderate inter-annotator agreement.
3 PAPERS • NO BENCHMARKS YET
How and where proteins interface with one another can ultimately impact the proteins' functions along with a range of other biological processes. As such, precise computational methods for protein interface prediction (PIP) come highly sought after as they could yield significant advances in drug discovery and design as well as protein function analysis. However, the traditional benchmark dataset for this task, Docking Benchmark 5 (DB5), contains only a paltry 230 complexes for training, validating, and testing different machine learning algorithms. In this work, we expand on a dataset recently introduced for this task, the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces. The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex, whereas DIPS-Plus now includes a plethora of new residue-level
The dataset contains a Video capsule endoscopy dataset for polyp segmentation.
3 PAPERS • 1 BENCHMARK
The “Medico automatic polyp segmentation challenge” aims to develop computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps (for example, irregular polyp, smaller or flat polyps) with high efficiency and accuracy. The main goal of the challenge is to benchmark semantic segmentation algorithms on a publicly available dataset, emphasizing robustness, speed, and generalization.
PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session is to transfer 6 blocks from the left to the right and back. Each block must be extracted from a peg with one hand, transferred to the other hand, and inserted in a peg at the other side of the board. All cases were acquired by a non-medical expert on the LTSI Laboratory from the University of Rennes. The data set was divided into a training data set composed of 90 cases and a test data set composed of 60 cases. A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.
3 PAPERS • 6 BENCHMARKS
Overview This database of simulated arterial pulse waves is designed to be representative of a sample of pulse waves measured from healthy adults. It contains pulse waves for 4,374 virtual subjects, aged from 25-75 years old (in 10 year increments). The database contains a baseline set of pulse waves for each of the six age groups, created using cardiovascular properties (such as heart rate and arterial stiffness) which are representative of healthy subjects at each age group. It also contains 728 further virtual subjects at each age group, in which each of the cardiovascular properties are varied within normal ranges. This allows for extensive in silico analyses of haemodynamics and the performance of pulse wave analysis algorithms.
Phee is a dataset for pharmacovigilance comprising over 5000 annotated events from medical case reports and biomedical literature. It is designed for biomedical event extraction tasks.
PubMedCite is a domain-specific dataset with about 192K biomedical scientific papers and a large citation graph preserving 917K citation relationships between them. It is characterized by preserving the salient contents extracted from full texts of references, and the weighted correlation between the salient.
Full-text chemical identification and indexing in PubMed articles.
2 PAPERS • 3 BENCHMARKS
A challenge that consists of three tasks, each targeting a different requirement for in-clinic use. The first task involves classifying images from the GI tract into 23 distinct classes. The second task focuses on efficiant classification measured by the amount of time spent processing each image. The last task relates to automatcially segmenting polyps.
2 PAPERS • 1 BENCHMARK