VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.
70 PAPERS • 1 BENCHMARK
SLAKE is an English-Chinese bilingual dataset consisting of 642 images and 14,028 question-answer pairs for training and testing Med-VQA systems.
30 PAPERS • 1 BENCHMARK
PathVQA consists of 32,799 open-ended questions from 4,998 pathology images where each question is manually checked to ensure correctness.
29 PAPERS • 1 BENCHMARK
PMC-VQA is a large-scale medical visual question-answering dataset that contains 227k VQA pairs of 149k images that cover various modalities or diseases. The question-answer pairs are generated from PMC-OA.
25 PAPERS • 3 BENCHMARKS
English subset of the SLAKE dataset, comprising 642 images and more than 7,000 question–answer pairs.
7 PAPERS • 1 BENCHMARK
Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of similar data in the medical field, specifically in histopathology, has halted similar progress. To enable similar representation learning for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 768,826 image and text pairs. Quilt was automatically curated using a mixture of models, including large language models), handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets, from other sources, including Twitter, research papers, and the internet in general, to create an even larger dat
4 PAPERS • NO BENCHMARKS YET
OVQA contains 19,020 medical visual question and answer pairs generated from 2,001 medical images collected from 2,212 EMRs in Orthopedics.
3 PAPERS • 1 BENCHMARK