UCF-CC-50 is a dataset for crowd counting and consists of images of extremely dense crowds. It has 50 images with 63,974 head center annotations in total. The head counts range between 94 and 4,543 per image. The small dataset size and large variance make this a very challenging counting dataset.
29 PAPERS • NO BENCHMARKS YET
A novel egocentric dataset collected from social mobile manipulator JackRabbot. The dataset includes 64 minutes of annotated multimodal sensor data including stereo cylindrical 360 degrees RGB video at 15 fps, 3D point clouds from two Velodyne 16 Lidars, line 3D point clouds from two Sick Lidars, audio signal, RGB-D video at 30 fps, 360 degrees spherical image from a fisheye camera and encoder values from the robot's wheels.
26 PAPERS • NO BENCHMARKS YET
TinyPerson is a benchmark for tiny object detection in a long distance and with massive backgrounds. The images in TinyPerson are collected from the Internet. First, videos with a high resolution are collected from different websites. Second, images from the video are sampled every 50 frames. Then images with a certain repetition (homogeneity) are deleted, and the resulting images are annotated with 72,651 objects with bounding boxes by hand.
19 PAPERS • NO BENCHMARKS YET
MIAP is a dataset created by obtaining a new set of annotations on a subset of the Open Images dataset, containing bounding boxes and attributes for all of the people visible in those images, as the original Open Images dataset annotations are not exhaustive, with bounding boxes and attribute labels for only a subset of the classes in each image.
14 PAPERS • NO BENCHMARKS YET
PANDA is the first gigaPixel-level humAN-centric viDeo dAtaset, for large-scale, long-term, and multi-object visual analysis. The videos in PANDA were captured by a gigapixel camera and cover real-world scenes with both wide field-of-view (~1 square kilometer area) and high-resolution details (~gigapixel-level/frame). The scenes may contain 4k head counts with over 100x scale variation. PANDA provides enriched and hierarchical ground-truth annotations, including 15,974.6k bounding boxes, 111.8k fine-grained attribute labels, 12.7k trajectories, 2.2k groups and 2.9k interactions.
10 PAPERS • NO BENCHMARKS YET
PTB-TIR is a Thermal InfraRed (TIR) pedestrian tracking benchmark, which provides 60 TIR sequences with mannuly annoations. The benchmark is used to fair evaluate TIR trackers.
5 PAPERS • NO BENCHMARKS YET
A large-scale video portrait dataset that contains 291 videos from 23 conference scenes with 14K frames. This dataset contains various teleconferencing scenes, various actions of the participants, interference of passers-by and illumination change.
3 PAPERS • NO BENCHMARKS YET
In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy, legal, safety, and ethical concerns may limit the ability to collect more human data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypoint R-CNN variant [1]. We found that pre-training a network using sy
2 PAPERS • NO BENCHMARKS YET
We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.
A dataset to encourage research in these environments. It consists of labeled stereo video of people in orange and apple orchards taken from two perception platforms (a tractor and a pickup truck), along with vehicle position data from RTK GPS.
1 PAPER • NO BENCHMARKS YET
The SoccerNet Game State Reconstruction task is a novel high level computer vision task that is specific to sports analytics. It aims at recognizing the state of a sport game, i.e., identifying and localizing all sports individuals (players, referees, ..) on the field based on a raw input videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number.
1 PAPER • 1 BENCHMARK
This dataset is an extremely challenging set of over 3000+ original Crowd images captured and crowdsourced from over 300+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs.
0 PAPER • NO BENCHMARKS YET
This dataset consists of 600+ items of faces with different emotions and mixed races that are ready to use for optimizing the accuracy of computer vision models. Human age range from 20 to 60 years old, balance in gender, no occlusion, with head direction (<45 degree up-down and left-right). All of the contents is sourced from PIXTA's stock library of 100M+ Asian-featured images and videos. PIXTA is the largest platform of visual materials in the Asia Pacific region offering fully-managed services, high quality contents and data, and powerful tools for businesses & organizations to enable their creative and machine learning projects. For more details, please refer to the link: https://www.pixta.ai/ Or send your inquiries to contact@pixta.ai