The Shanghaitech dataset is a large-scale crowd counting dataset. It consists of 1198 annotated crowd images. The dataset is divided into two parts, Part-A containing 482 images and Part-B containing 716 images. Part-A is split into train and test subsets consisting of 300 and 182 images, respectively. Part-B is split into train and test subsets consisting of 400 and 316 images. Each person in a crowd image is annotated with one point close to the center of the head. In total, the dataset consists of 330,165 annotated people. Images from Part-A were collected from the Internet, while images from Part-B were collected on the busy streets of Shanghai.
258 PAPERS • 5 BENCHMARKS
The UCF-QNRF dataset is a crowd counting dataset and it contains large diversity both in scenes, as well as in background types. It consists of 1535 images high-resolution images from Flickr, Web Search and Hajj footage. The number of people (i.e., the count) varies from 50 to 12,000 across images.
169 PAPERS • 1 BENCHMARK
JHU-CROWD++ is A large-scale unconstrained crowd counting dataset with 4,372 images and 1.51 million annotations. This dataset is collected under a variety of diverse scenarios and environmental conditions. In addition, the dataset provides comparatively richer set of annotations like dots, approximate bounding boxes, blur levels, etc.
46 PAPERS • 1 BENCHMARK
UCF-CC-50 is a dataset for crowd counting and consists of images of extremely dense crowds. It has 50 images with 63,974 head center annotations in total. The head counts range between 94 and 4,543 per image. The small dataset size and large variance make this a very challenging counting dataset.
29 PAPERS • NO BENCHMARKS YET
NWPU-Crowd consists of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (0~20,033).
28 PAPERS • NO BENCHMARKS YET
(JHU-CROWD) a crowd counting dataset that contains 4,250 images with 1.11 million annotations. This dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations in addition to many distractor images, making it a very challenging dataset. Additionally, the dataset consists of rich annotations at both image-level and head-level.
21 PAPERS • NO BENCHMARKS YET
The Fudan-ShanghaiTech dataset (FDST) is a dataset for video crowd counting. It contains 15K frames with about 394K annotated heads captured from 13 different scenes
17 PAPERS • NO BENCHMARKS YET
7 PAPERS • 2 BENCHMARKS
The DLR-ACD dataset is a collection of aerial images for crowd counting and density estimation, as well as for person localization at mass events. It contains 33 large aerial images acquired through 16 different flight campaigns at various mass events and over urban scenes involving crowds, such as sport events, city centers, open-air fairs and festivals.
6 PAPERS • 1 BENCHMARK
Includes 4405 images with 111251 heads annotated.
6 PAPERS • NO BENCHMARKS YET
WWW Crowd provides 10,000 videos with over 8 million frames from 8,257 diverse scenes, therefore offering a comprehensive dataset for the area of crowd understanding.
5 PAPERS • NO BENCHMARKS YET
The TUB CrowdFlow is a synthetic dataset that contains 10 sequences showing 5 scenes. Each scene is rendered twice: with a static point of view and a dynamic camera to simulate drone/UAV based surveillance. The scenes are render using Unreal Engine at HD resolution (1280x720) at 25 fps, which is typical for current commercial CCTV surveillance systems. The total number of frames is 3200.
4 PAPERS • NO BENCHMARKS YET
DroneCrowd is a benchmark for object detection, tracking and counting algorithms in drone-captured videos. It is a drone-captured large scale dataset formed by 112 video clips with 33,600 HD frames in various scenarios. Notably, it has annotations for 20,800 people trajectories with 4.8 million heads and several video-level attributes.
BEV Crowd-Counting dataset extended from CityUHK-X
2 PAPERS • NO BENCHMARKS YET
Multi Task Crowd is a new 100 image dataset fully annotated for crowd counting, violent behaviour detection and density level classification.
RSOC is a large-scale object counting dataset with remote sensing images, which contains four important geographic objects: buildings, crowded ships in harbors, large-vehicles and small-vehicles in parking lots.
SmartCity consists of 50 images in total collected from ten city scenes including office entrance, sidewalk, atrium, shopping mall etc.. Unlike the existing crowd counting datasets with images of hundreds/thousands of pedestrians and nearly all the images being taken outdoors, SmartCity has few pedestrians in images and consists of both outdoor and indoor scenes: the average number of pedestrians is only 7.4 with minimum being 1 and maximum being 14.
A large synthetic multi-camera crowd counting dataset with a large number of scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset.
1 PAPER • NO BENCHMARKS YET
This dataset is an extremely challenging set of over 3000+ original Crowd images captured and crowdsourced from over 300+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
0 PAPER • NO BENCHMARKS YET