🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

29 dataset results for Image-to-Image Translation

Cityscapes is a large-scale database which focuses on semantic understanding of urban street scenes. It provides semantic, instance-wise, and dense pixel annotations for 30 classes grouped into 8 categories (flat surfaces, humans, vehicles, constructions, objects, nature, sky, and void). The dataset consists of around 5000 fine annotated images and 20000 coarse annotated ones. Data was captured in 50 cities during several months, daytimes, and good weather conditions. It was originally recorded as video so the frames were manually selected to have the following features: large number of dynamic objects, varying scene layout, and varying background.

3,323 PAPERS • 54 BENCHMARKS

KITTI

KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) is one of the most popular datasets for use in mobile robotics and autonomous driving. It consists of hours of traffic scenarios recorded with a variety of sensor modalities, including high-resolution RGB, grayscale stereo cameras, and a 3D laser scanner. Despite its popularity, the dataset itself does not contain ground truth for semantic segmentation. However, various researchers have manually annotated parts of the dataset to fit their necessities. Álvarez et al. generated ground truth for 323 images from the road detection challenge with three classes: road, vertical, and sky. Zhang et al. annotated 252 (140 for training and 112 for testing) acquisitions – RGB and Velodyne scans – from the tracking challenge for ten object categories: building, sky, road, vegetation, sidewalk, car, pedestrian, cyclist, sign/pole, and fence. Ros et al. labeled 170 training images and 46 testing images (from the visual odome

3,219 PAPERS • 141 BENCHMARKS

ADE20K

The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. There are totally 150 semantic categories, which include stuffs like sky, road, grass, and discrete objects like person, car, bed.

995 PAPERS • 25 BENCHMARKS

CelebA-HQ

The CelebA-HQ dataset is a high-quality version of CelebA that consists of 30,000 images at 1024×1024 resolution.

808 PAPERS • 13 BENCHMARKS

SYNTHIA (SYNTHetic Collection of Imagery and Annotations)

The SYNTHIA dataset is a synthetic dataset that consists of 9400 multi-viewpoint photo-realistic frames rendered from a virtual city and comes with pixel-level semantic annotations for 13 classes. Each frame has resolution of 1280 × 960.

500 PAPERS • 10 BENCHMARKS

GTA5 (Grand Theft Auto 5)

The GTA5 dataset contains 24966 synthetic images with pixel level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. There are 19 semantic classes which are compatible with the ones of Cityscapes dataset.

379 PAPERS • 7 BENCHMARKS

DeepFashion

DeepFashion is a dataset containing around 800K diverse fashion images with their rich annotations (46 categories, 1,000 descriptive attributes, bounding boxes and landmark information) ranging from well-posed product images to real-world-like consumer photos.

362 PAPERS • 6 BENCHMARKS

Perceptual Similarity

Perceptual Similarity is a dataset of human perceptual similarity judgments.

331 PAPERS • NO BENCHMARKS YET

COCO-Stuff (Common Objects in COntext-stuff)

The Common Objects in COntext-stuff (COCO-stuff) dataset is a dataset for scene understanding tasks like semantic segmentation, object detection and image captioning. It is constructed by annotating the original COCO dataset, which originally annotated things while neglecting stuff annotations. There are 164k images in COCO-stuff dataset that span over 172 categories including 80 things, 91 stuff, and 1 unlabeled class.

265 PAPERS • 20 BENCHMARKS

AFHQ (Animal Faces-HQ)

Animal FacesHQ (AFHQ) is a dataset of animal faces consisting of 15,000 high-quality images at 512 × 512 resolution. The dataset includes three domains of cat, dog, and wildlife, each providing 5000 images. By having multiple (three) domains and diverse images of various breeds (≥ eight) per each domain, AFHQ sets a more challenging image-to-image translation problem. All images are vertically and horizontally aligned to have the eyes at the center. The low-quality images were discarded by human effort.

264 PAPERS • 6 BENCHMARKS

Foggy Cityscapes

Foggy Cityscapes is a synthetic foggy dataset which simulates fog on real scenes. Each foggy image is rendered with a clear image and depth map from Cityscapes. Thus the annotations and data split in Foggy Cityscapes are inherited from Cityscapes.

207 PAPERS • 6 BENCHMARKS

CelebAMask-HQ

CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has segmentation mask of facial attributes corresponding to CelebA.

141 PAPERS • 4 BENCHMARKS

RaFD (Radboud Faces Database)

The Radboud Faces Database (RaFD) is a set of pictures of 67 models (both adult and children, males and females) displaying 8 emotional expressions.

77 PAPERS • 2 BENCHMARKS

LLVIP (A Visible-infrared Paired Dataset for Low-light Vision)

Visible-infrared Paired Dataset for Low-light Vision 30976 images (15488 pairs) 24 dark scenes, 2 daytime scenes Support for image-to-image translation (visible to infrared, or infrared to visible), visible and infrared image fusion, low-light pedestrian detection, and infrared pedestrian detection (The original image and video pairs (before registration) of LLVIP are also released!)

49 PAPERS • 6 BENCHMARKS

Synscapes

Synscapes is a synthetic dataset for street scene parsing created using photorealistic rendering techniques, and show state-of-the-art results for training and validation as well as new types of analysis.

42 PAPERS • 1 BENCHMARK

People Snapshot Dataset

Enables detailed human body model reconstruction in clothing from a single monocular RGB video without requiring a pre scanned template or manually clicked points.

33 PAPERS • NO BENCHMARKS YET

UT Zappos50K

UT Zappos50K is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis.

30 PAPERS • 1 BENCHMARK

IXI (IXI Brain Development Dataset)

IXI Dataset is a collection of 600 MR brain images from normal, healthy subjects. The MR image acquisition protocol for each subject includes:

20 PAPERS • 4 BENCHMARKS

VIDIT (Virtual Image Dataset for Illumination Transfer)

VIDIT is a reference evaluation benchmark and to push forward the development of illumination manipulation methods. VIDIT includes 390 different Unreal Engine scenes, each captured with 40 illumination settings, resulting in 15,600 images. The illumination settings are all the combinations of 5 color temperatures (2500K, 3500K, 4500K, 5500K and 6500K) and 8 light directions (N, NE, E, SE, S, SW, W, NW). Original image resolution is 1024x1024.

20 PAPERS • 1 BENCHMARK

LaMem

An annotated image memorability dataset to date (with 60,000 labeled images from a diverse array of sources).

16 PAPERS • NO BENCHMARKS YET

BCI (Breast Cancer Immunohistochemical Image Generation)

The evaluation of human epidermal growth factor receptor 2 (HER2) expression is essential to formulate a precise treatment for breast cancer. The routine evaluation of HER2 is conducted with immunohistochemical techniques (IHC), which is very expensive. Therefore, we propose a breast cancer immunohistochemical (BCI) benchmark attempting to synthesize IHC data directly with the paired hematoxylin and eosin (HE) stained images. The dataset contains 4870 registered image pairs, covering a variety of HER2 expression levels (0, 1+, 2+, 3+).

10 PAPERS • 1 BENCHMARK

selfie2anime

The selfie dataset contains 46,836 selfie images annotated with 36 different attributes. We only use photos of females as training data and test data. The size of the training dataset is 3400, and that of the test dataset is 100, with the image size of 256 x 256. For the anime dataset, we have firstly retrieved 69,926 animation character images from Anime-Planet1. Among those images, 27,023 face images are extracted by using an anime-face detector2. After selecting only female character images and removing monochrome images manually, we have collected two datasets of female anime face images, with the sizes of 3400 and 100 for training and test data respectively, which is the same numbers as the selfie dataset. Finally, all anime face images are resized to 256 x 256 by applying a CNN-based image super-resolution algorithm.

10 PAPERS • 1 BENCHMARK

SEN12MS-CR-TS

SEN12MS-CR-TS is a multi-modal and multi-temporal data set for cloud removal. It contains time-series of paired and co-registered Sentinel-1 and cloudy as well as cloud-free Sentinel-2 data from European Space Agency's Copernicus mission. Each time series contains 30 cloudy and clear observations regularly sampled throughout the year 2018. Our multi-temporal data set is readily pre-processed and backward-compatible with SEN12MS-CR.

7 PAPERS • 1 BENCHMARK

FFHQ-Aging

FFHQ-Aging is a Dataset of human faces designed for benchmarking age transformation algorithms as well as many other possible vision tasks. This dataset is an extention of the NVIDIA FFHQ dataset, on top of the 70,000 original FFHQ images, it also contains the following information for each image: * Gender information (male/female with confidence score) * Age group information (10 classes with confidence score) * Head pose (pitch, roll & yaw) * Glasses type (none, normal or dark) * Eye occlusion score (0-100, different score for each eye) * Full semantic map (19 classes, based on CelebAMask-HQ labels)

6 PAPERS • NO BENCHMARKS YET

BCNB (Early Breast Cancer Core-Needle Biopsy WSI)

Breast cancer (BC) has become the greatest threat to women’s health worldwide. Clinically, identification of axillary lymph node (ALN) metastasis and other tumor clinical characteristics such as ER, PR, and so on, are important for evaluating the prognosis and guiding the treatment for BC patients.

3 PAPERS • NO BENCHMARKS YET

Mila Simulated Floods

Mila Simulated Floods Dataset is a 1.5 square km virtual world using the Unity3D game engine including urban, suburban and rural areas.

2 PAPERS • 1 BENCHMARK

OADAT

OADAT (OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing)

An experimental and synthetic (simulated) OA raw signals and reconstructed image domain datasets rendered with different experimental parameters and tomographic acquisition geometries.

2 PAPERS • NO BENCHMARKS YET

LISA Gaze Dataset

LISA Gaze is a dataset for driver gaze estimation comprising of 11 long drives, driven by 10 subjects in two different cars.

1 PAPER • NO BENCHMARKS YET

UDA-CH (Unsupervised Domain Adaptation on Cultural Heritage)

UDA-CH contains 16 objects that cover a variety of artworks which can be found in a museum like sculptures, paintings and books. Specifically, the dataset has been collected inside the cultural site “Galleria Regionale di Palazzo Bellomo” located in Siracusa, Italy.

1 PAPER • 1 BENCHMARK

Datasets

29 dataset results for Image-to-Image Translation