Market-1501 is a large-scale public benchmark dataset for person re-identification. It contains 1501 identities which are captured by six different cameras, and 32,668 pedestrian image bounding-boxes obtained using the Deformable Part Models pedestrian detector. Each person has 3.6 images on average at each viewpoint. The dataset is split into two parts: 750 identities are utilized for training and the remaining 751 identities are used for testing. In the official testing protocol 3,368 query images are selected as probe set to find the correct match across 19,732 reference gallery images.
812 PAPERS • 9 BENCHMARKS
The CUHK03 consists of 14,097 images of 1,467 different identities, where 6 campus cameras were deployed for image collection and each identity is captured by 2 campus cameras. This dataset provides two types of annotations, one by manually labelled bounding boxes and the other by bounding boxes produced by an automatic detector. The dataset also provides 20 random train/test splits in which 100 identities are selected for testing and the rest for training
398 PAPERS • 8 BENCHMARKS
The DukeMTMC-reID (Duke Multi-Tracking Multi-Camera ReIDentification) dataset is a subset of the DukeMTMC for image-based person re-ID. The dataset is created from high-resolution videos from 8 different cameras. It is one of the largest pedestrian image datasets wherein images are cropped by hand-drawn bounding boxes. The dataset consists 16,522 training images of 702 identities, 2,228 query images of the other 702 identities and 17,661 gallery images.
326 PAPERS • 6 BENCHMARKS
MSMT17 is a multi-scene multi-time person re-identification dataset. The dataset consists of 180 hours of videos, captured by 12 outdoor cameras, 3 indoor cameras, and during 12 time slots. The videos cover a long period of time and present complex lighting variations, and it contains a large number of annotated identities, i.e., 4,101 identities and 126,441 bounding boxes.
237 PAPERS • 6 BENCHMARKS
MARS (Motion Analysis and Re-identification Set) is a large scale video based person reidentification dataset, an extension of the Market-1501 dataset. It has been collected from six near-synchronized cameras. It consists of 1,261 different pedestrians, who are captured by at least 2 cameras. The variations in poses, colors and illuminations of pedestrians, as well as the poor image quality, make it very difficult to yield high matching accuracy. Moreover, the dataset contains 3,248 distractors in order to make it more realistic. Deformable Part Model and GMMCP tracker were used to automatically generate the tracklets (mostly 25-50 frames long).
169 PAPERS • 2 BENCHMARKS
The Viewpoint Invariant Pedestrian Recognition (VIPeR) dataset includes 632 people and two outdoor cameras under different viewpoints and light conditions. Each person has one image per camera and each image has been scaled to be 128×48 pixels. It provides the pose angle of each person as 0° (front), 45°, 90° (right), 135°, and 180° (back).
132 PAPERS • NO BENCHMARKS YET
The CUKL-SYSY dataset is a large scale benchmark for person search, containing 18,184 images and 8,432 identities. Different from previous re-id benchmarks, matching query persons with manually cropped pedestrians, this dataset is much closer to real application scenarios by searching person from whole images in the gallery.
92 PAPERS • 2 BENCHMARKS
The SYSU-MM01 is a dataset collected for the Visible-Infrared Re-identification problem. The images in the dataset were obtained from 491 different persons by recording them using 4 RGB and 2 infrared cameras. Within the dataset, the persons are divided into 3 fixed splits to create training, validation and test sets. In the training set, there are 20284 RGB and 9929 infrared images of 296 persons. The validation set contains 1974 RGB and 1980 infrared images of 99 persons. The testing set consists of the images of 96 persons where 3803 infrared images are used as query and 301 randomly selected RGB images are used as gallery.
89 PAPERS • 2 BENCHMARKS
PRW is a large-scale dataset for end-to-end pedestrian detection and person recognition in raw video frames. PRW is introduced to evaluate Person Re-identification in the Wild, using videos acquired through six synchronized cameras. It contains 932 identities and 11,816 frames in which pedestrians are annotated with their bounding box positions and identities.
69 PAPERS • NO BENCHMARKS YET
ETH is a dataset for pedestrian detection. The testing set contains 1,804 images in three video clips. The dataset is captured from a stereo rig mounted on car, with a resolution of 640 x 480 (bayered), and a framerate of 13--14 FPS.
59 PAPERS • 5 BENCHMARKS
Occluded REID is an occluded person dataset captured by mobile cameras, consisting of 2,000 images of 200 occluded persons (see Fig. (c)). Each identity has 5 full-body person images and 5 occluded person images with different types of occlusion.
59 PAPERS • 1 BENCHMARK
RegDB is used for Visible-Infrared Re-ID which handles the cross-modality matching between the daytime visible and night-time infrared images. The dataset contains images of 412 people. It includes 10 color and 10 thermal images for each person.
52 PAPERS • 2 BENCHMARKS
The DukeMTMC-VideoReID (Duke Multi-Tracking Multi-Camera Video-based ReIDentification) dataset is a subset of the DukeMTMC for video-based person re-ID. The dataset is created from high-resolution videos from 8 different cameras. It is one of the largest pedestrian video datasets wherein images are cropped by hand-drawn bounding boxes. The dataset consists 4832 tracklets of 1812 identities in total, and each tracklet has 168 frames on average.
48 PAPERS • 2 BENCHMARKS
A novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production.
45 PAPERS • 7 BENCHMARKS
CityFlow is a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. The dataset contains more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions.
43 PAPERS • 2 BENCHMARKS
Partial REID is a specially designed partial person reidentification dataset that includes 600 images from 60 people, with 5 full-body images and 5 occluded images per person. These images were collected on a university campus by 6 cameras from different viewpoints, backgrounds and different types of occlusion. The examples of partial persons in the Partial REID dataset are shown in the Figure.
39 PAPERS • NO BENCHMARKS YET
UAV-Human is a large dataset for human behavior understanding with UAVs. It contains 67,428 multi-modal video sequences and 119 subjects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. The dataset was collected by a flying UAV in multiple urban and rural districts in both daytime and nighttime over three months, hence covering extensive diversities w.r.t subjects, backgrounds, illuminations, weathers, occlusions, camera motions, and UAV flying attitudes. This dataset can be used for UAV-based human behavior understanding, including action recognition, pose estimation, re-identification, and attribute recognition.
38 PAPERS • 5 BENCHMARKS
Veri-Wild is the largest vehicle re-identification dataset (as of CVPR 2019). The dataset is captured from a large CCTV surveillance system consisting of 174 cameras across one month (30× 24h) under unconstrained scenarios. This dataset comprises 416,314 vehicle images of 40,671 identities. Evaluation on this dataset is split across three subsets: small, medium and large; comprising 3000, 5000 and 10,000 identities respectively (in probe and gallery sets).
38 PAPERS • 3 BENCHMARKS
RSTPReid contains 20505 images of 4,101 persons from 15 cameras. Each person has 5 corresponding images taken by different cameras with complex both indoor and outdoor scene transformations and backgrounds in various periods of time, which makes RSTPReid much more challenging and more adaptable to real scenarios. Each image is annotated with 2 textual descriptions. For data division, 3701 (index < 18505), 200 (18505 <= index < 19505) and 200 (index >= 19505) identities are utilized for training, validation and testing, respectively (Marked by item 'split' in the JSON file). Each sentence is no shorter than 23 words.
33 PAPERS • 1 BENCHMARK
This dataset consists of 33698 images from 221 identities. Each person in Cameras A and B is wearing the same clothes, but the images are captured in different rooms. For Camera C, the person wears different clothes, and the images are captured in a different day.
31 PAPERS • 2 BENCHMARKS
LTCC contains 17,119 person images of 152 identities, and each identity is captured by at least two cameras. The dataset can be divided into two subsets: one cloth-change set where 91 persons appear with 416 different sets of outfits in 14,783 images, and one cloth-consistent subset containing the remaining 61 identities with 2,336 images without outfit changes. On average, there are 5 different clothes for each cloth-changing person, with the number of outfit changes ranging from 2 to 14.
26 PAPERS • 2 BENCHMARKS
Occluded-DukeMTMC contains 15,618 training images, 17,661 gallery images, and 2,210 occluded query images. The experiment results on Occluded-DukeMTMC will demonstrate the superiority of our method in Occluded Person Re-ID problems, let alone that our method does not need any manually cropping procedure as pre-process.
26 PAPERS • 1 BENCHMARK
Market-1501-C is an evaluation set that consists of algorithmically generated corruptions applied to the Market-1501 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
22 PAPERS • 1 BENCHMARK
The iLIDS-VID dataset is a person re-identification dataset which involves 300 different pedestrians observed across two disjoint camera views in public open space. It comprises 600 image sequences of 300 distinct individuals, with one pair of image sequences from two camera views for each person. Each image sequence has variable length ranging from 23 to 192 image frames, with an average number of 73. The iLIDS-VID dataset is very challenging due to clothing similarities among people, lighting and viewpoint variations across camera views, cluttered background and random occlusions.
19 PAPERS • 2 BENCHMARKS
PRID 2011 is a person reidentification dataset that provides multiple person trajectories recorded from two different static surveillance cameras, monitoring crosswalks and sidewalks. The dataset shows a clean background, and the people in the dataset are rarely occluded. In the dataset, 200 people appear in both views. Among the 200 people, 178 people have more than 20 appearances
18 PAPERS • 2 BENCHMARKS
A novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis.
15 PAPERS • NO BENCHMARKS YET
SYSU-30k contains 30k categories of persons, which is about 20 times larger than CUHK03 (1.3k categories) and Market1501 (1.5k categories), and 30 times larger than ImageNet (1k categories). SYSU-30k contains 29,606,918 images. Moreover, SYSU-30k provides not only a large platform for the weakly supervised ReID problem but also a more challenging test set that is consistent with the realistic setting for standard evaluation.
14 PAPERS • 2 BENCHMARKS
Partial iLIDS is a dataset for occluded person person re-identification. It contains a total of 476 images of 119 people captured by 4 non-overlapping cameras. Some images contain people occluded by other individuals or luggage.
12 PAPERS • NO BENCHMARKS YET
One large-scale database for Text-to-Image Person Re-identification, i.e., Text-based Person Retrieval.
11 PAPERS • 2 BENCHMARKS
P-DukeMTMC-reID is a modified version based on DukeMTMC-reID dataset. There are 12,927 images (665 identifies) in training set, 2,163 images (634 identities) for querying and 9,053 images in the gallery set.
11 PAPERS • 1 BENCHMARK
CUHK03-C is an evaluation set that consists of algorithmically generated corruptions applied to the CUHK03 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
9 PAPERS • 1 BENCHMARK
The Airport dataset is a dataset for person re-identification which consists of 39,902 images and 9,651 identities across six cameras.
8 PAPERS • NO BENCHMARKS YET
Labeled Pedestrian in the Wild (LPW) is a pedestrian detection dataset that contains 2,731 pedestrians in three different scenes where each annotated identity is captured by from 2 to 4 cameras. The LPW features a notable scale of 7,694 tracklets with over 590,000 images as well as the cleanliness of its tracklets. It distinguishes from existing datasets in three aspects: large scale with cleanliness, automatically detected bounding boxes and far more crowded scenes with greater age span. This dataset provides a more realistic and challenging benchmark, which facilitates the further exploration of more powerful algorithms.
SenseReID is a person re-identification dataset for evaluating ReID models. It is captured from real surveillance cameras and the person bounding boxes are obtained from state-of-the-art detection algorithm. The dataset contains 1,717 identities in total.
7 PAPERS • 1 BENCHMARK
iQIYI-VID dataset, which comprises video clips from iQIYI variety shows, films, and television dramas. The whole dataset contains 500,000 videos clips of 5,000 celebrities. The length of each video is 1~30 seconds.
6 PAPERS • NO BENCHMARKS YET
LReID is a benchmark for lifelong person reidentification. It has been built using existing datasets, and it consists of two subsets: LReID-Seen and LReID-Unseen.
5 PAPERS • NO BENCHMARKS YET
MSMT17-C is an evaluation set that consists of algorithmically generated corruptions applied to the MSMT17 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
5 PAPERS • 1 BENCHMARK
Provides consistent ID annotations across multiple days, making it suitable for the extremely challenging problem of person search, i.e., where no clothing information can be reliably used. Apart this feature, the P-DESTRE annotations enable the research on UAV-based pedestrian detection, tracking, re-identification and soft biometric solutions.
The ClonedPerson dataset is a large-scale synthetic person re-identification dataset introduced in the paper "Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification" in CVPR 2022. It is generated by MakeHuman and Unity3D. Characters in this dataset use an automatic approach to directly clone the whole outfits from real-world person images to virtual 3D characters, such that any virtual person thus created will appear very similar to its real-world counterpart. The dataset contains 887,766 synthesized person images of 5,621 identities.
4 PAPERS • 4 BENCHMARKS
This dataset contains 114 individuals including 1824 images captured from two disjoint camera views. For each person, eight images are captured from eight different orientations under one camera view and are normalized to 128x48 pixels. This dataset is also split into two parts randomly. One contains 57 individuals for training, and the other contains 57 individuals for testing.
4 PAPERS • 1 BENCHMARK
RegDB-C is an evaluation set that consists of algorithmically generated corruptions applied to the RegDB test-set (color images). These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
SYSU-MM01-C is an evaluation set that consists of algorithmically generated corruptions applied to the SYSU-MM01 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
4 PAPERS • 2 BENCHMARKS
The images in DukeMTMC-attribute dataset comes from Duke University. There are 1812 identities and 34183 annotated bounding boxes in the DukeMTMC-attribute dataset. This dataset contains 702 identities for training and 1110 identities for testing, corresponding to 16522 and 17661 images respectively. The attributes are annotated in the identity level, every image in this dataset is annotated with 23 attributes.
3 PAPERS • NO BENCHMARKS YET
Multi-view Extended Videos with Identities dataset (MEVID) is a dataset for large-scale, video person re-identification (ReID) in the wild. It spans an extensive indoor and outdoor environment across nine unique dates in a 73-day window, various camera viewpoints, and entity clothing changes. Specifically, it contains labels of the identities of 158 unique people wearing 598 outfits taken from 8, 092 tracklets, average length of about 590 frames, seen in 33 camera views from the very large-scale MEVA person activities dataset.
The Market1501-Attributes dataset is built from the Market1501 dataset. Market1501 Attribute is an augmentation of this dataset with 28 hand annotated attributes, such as gender, age, sleeve length, flags for items carried as well as upper clothes colors and lower clothes colors.
The OpeReid dataset is a person re-identification dataset that consists of 7,413 images of 200 persons.
Person re-ID matches persons across multiple non-overlapping cameras. Despite the increasing deployment of airborne platforms in surveillance, current existing person re-ID benchmarks' focus is on ground-ground matching and very limited efforts on aerial-aerial matching. We propose a new benchmark dataset - AG-ReID, which performs person re-ID matching in a new setting: across aerial and ground cameras. Our dataset contains 21,983 images of 388 identities and 15 soft attributes for each identity. The data was collected by a UAV flying at altitudes between 15 to 45 meters and a ground-based CCTV camera on a university campus. Our dataset presents a novel elevated-viewpoint challenge for person re-ID due to the significant difference in person appearance across these cameras.
2 PAPERS • 1 BENCHMARK