DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The instances in DOTA images are annotated by experts in aerial image interpretation by arbitrary (8 d.o.f.) quadrilateral. We will continue to update DOTA, to grow in size and scope to reflect evolving real-world conditions. Now it has three versions:
248 PAPERS • 4 BENCHMARKS
iSAID contains 655,451 object instances for 15 categories across 2,806 high-resolution images. The images of iSAID is the same as the DOTA-v1.0 dataset, which are manily collected from the Google Earth, some are taken by satellite JL-1, the others are taken by satellite GF-2 of the China Centre for Resources Satellite Data and Application.
57 PAPERS • 3 BENCHMARKS
The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.
4 PAPERS • 1 BENCHMARK
The Small Object Detection for Spotting Birds (SOD4SB) dataset is a dataset consisting of 39,070 images including 137,121 bird instances. The SOD4SD dataset contains a wide variety of small bird types and a variety of scenes.
4 PAPERS • 2 BENCHMARKS
The Aircraft Context Dataset, a composition of two inter-compatible large-scale and versatile image datasets focusing on manned aircraft and UAVs, is intended for training and evaluating classification, detection and segmentation models in aerial domains. Additionally, a set of relevant meta-parameters can be used to quantify dataset variability as well as the impact of environmental conditions on model performance.
3 PAPERS • NO BENCHMARKS YET
SODA-A is a large-scale benchmark specialized for small object detection task under aerial scenes, which has 800203 instances with oriented rectangle box annotation across 9 classes. It contains 2510 high-resolution images extracted from Google Earth.
SODA-D is a large-scale dataset tailored for small object detection in driving scenario, which is built on top of MVD dataset and owned data, where the former is a dataset dedicated to pixel-level understanding of street scenes, and the latter is mainly captured by onboard cameras and mobile phones. With 24704 well-chosen and high-quality images of driving scenarios, SODA-D comprises 277596 instances of 9 categories with horizontal bounding boxes.
2 PAPERS • 1 BENCHMARK
The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.
1 PAPER • NO BENCHMARKS YET
A dataset for flying honeybee detection introduced in "A Method for Detection of Small Moving Objects in UAV Videos".
1 PAPER • 1 BENCHMARK
Overview This is a dataset of blood cells photos.
USC-GRAD-STDdb comprises 115 video segments containing more than 25,000 annotated frames of HD 720p resolution (≈1280x720) with small objects of interest from 16 (≈4x4) to 256 (≈16x16) as pixel area. The length of the videos changes from 150 up to 500 frames. The size of every object is determined through the bounding box, so that a good annotation is of utmost importance for reliable performance metrics. As it may seem obvious, the smaller the object, the harder the annotation. The annotation has been carried out with the ViTBAT tool, adjusting the boxes as much as possible to the objects of interest in each video frame. In total, more than 56,000 ground truth labels have been generated.