🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language

53 dataset results for 3d meshes

ShapeNetCore is a subset of the full ShapeNet dataset with single clean 3D models and manually verified category and alignment annotations. It covers 55 common object categories with about 51,300 unique 3D models. The 12 object categories of PASCAL 3D+, a popular computer vision 3D benchmark dataset, are all covered by ShapeNetCore.

156 PAPERS • 1 BENCHMARK

COMA

CoMA contains 17,794 meshes of the human face in various expressions

75 PAPERS • 1 BENCHMARK

Scan2CAD

Scan2CAD is an alignment dataset based on 1506 ScanNet scans with 97607 annotated keypoints pairs between 14225 (3049 unique) CAD models from ShapeNet and their counterpart objects in the scans. The top 3 annotated model classes are chairs, tables and cabinets which arises due to the nature of indoor scenes in ScanNet. The number of objects aligned per scene ranges from 1 to 40 with an average of 9.3.

65 PAPERS • 1 BENCHMARK

Hypersim

For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. Hypersim is a photorealistic synthetic dataset for holistic indoor scene understanding. It contains 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.

59 PAPERS • 1 BENCHMARK

BEAT (Body-Expression-Audio-Text)

BEAT has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages, ii) 32 millions frame-level emotion and semantic relevance annotations. Our statistical analysis on BEAT demonstrates the correlation of conversational gestures with \textit{facial expressions}, \textit{emotions}, and \textit{semantics}, in addition to the known correlation with \textit{audio}, \textit{text}, and \textit{speaker identity}. Based on this observation, we propose a baseline model, \textbf{Ca}scaded \textbf{M}otion \textbf{N}etwork \textbf{(CaMN)}, which consists of above six modalities modeled in a cascaded architecture for gesture synthesis. To evaluate the semantic relevancy, we introduce a metric, Semantic Relevance Gesture Recall (\textbf{SRGR}). Qualitative and quantitative experiments demonstrate metrics' validness, ground truth data quality, and baseline's state-of-the-art performance. To the best of our knowledge,

37 PAPERS • 1 BENCHMARK

Gait3D

Gait3D is a large-scale 3D representation-based gait recognition dataset. It contains 4,000 subjects and over 25,000 sequences extracted from 39 cameras in an unconstrained indoor scene.

35 PAPERS • 2 BENCHMARKS

12 Scenes

Dataset containing RGB-D data of 4 large scenes, comprising a total of 12 rooms, for the purpose of RGB and RGB-D camera relocalization. The RGB-D data was captured using a Structure.io depth sensor coupled with an iPad color camera. Each room was scanned multiple times, with the multiple sequences run through a global bundle adjustment in order to obtain globally aligned camera poses though all sequences of the same scene.

32 PAPERS • NO BENCHMARKS YET

OmniObject3D

OmniObject3D is a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects. OmniObject3D has several appealing properties:

30 PAPERS • NO BENCHMARKS YET

NoW Benchmark

The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions.

26 PAPERS • 1 BENCHMARK

ScanNet200

The ScanNet200 benchmark studies 200-class 3D semantic segmentation - an order of magnitude more class categories than previous 3D scene understanding benchmarks. The source of scene data is identical to ScanNet, but parses a larger vocabulary for semantic and instance segmentation

22 PAPERS • 3 BENCHMARKS

REALY (Region-aware benchmark based on the LYHM)

The REALY benchmark aims to introduce a region-aware evaluation pipeline to measure the fine-grained normalized mean square error (NMSE) of 3D face reconstruction methods from under-controlled image sets.

21 PAPERS • 2 BENCHMARKS

SHREC'19 (SHREC'19 track Matching Humans with Different Connectivity)

Shape matching plays an important role in geometry processing and shape analysis. In the last decades, much research has been devoted to improve the quality of matching between surfaces. This huge effort is motivated by several applications such as object retrieval, animation and information transfer just to name a few. Shape matching is usually divided into two main categories: rigid and non rigid matching. In both cases, the standard evaluation is usually performed on shapes that share the same connectivity, in other words, shapes represented by the same mesh. This is mainly due to the availability of a “natural” ground truth that is given for these shapes. Indeed, in most cases the consistent connectivity directly induces a ground truth correspondence between vertices. However, this standard practice obviously does not allow to estimate the robustness of a method with respect to different connectivity. With this track, we propose a benchmark to evaluate the performance of point-to-p

18 PAPERS • 1 BENCHMARK

SSP-3D (Sports Shape and Pose 3D)

SSP-3D is an evaluation dataset consisting of 311 images of sportspersons in tight-fitted clothes, with a variety of body shapes and poses. The images were collected from the Sports-1M dataset. SSP-3D is intended for use as a benchmark for body shape prediction methods. Pseudo-ground-truth 3D shape labels (using the SMPL body model) were obtained via multi-frame optimisation with shape consistency between frames, as described here.

14 PAPERS • 1 BENCHMARK

HM3DSem

The Habitat-Matterport 3D Semantics Dataset (HM3DSem) is the largest-ever dataset of 3D real-world and indoor spaces with densely annotated semantics that is available to the academic community. HM3DSem v0.2 consists of 142,646 object instance annotations across 216 3D-spaces from HM3D and 3,100 rooms within those spaces. The HM3D scenes are annotated with the 142,646 raw object names, which are mapped to 40 Matterport categories. On average, each scene in HM3DSem v0.2 consists of 661 objects from 106 categories. This dataset is the result of 14,200+ hours of human effort for annotation and verification by 20+ annotators.

10 PAPERS • NO BENCHMARKS YET

3D AffordanceNet

3D AffordanceNet is a dataset of 23k shapes for visual affordance. It consists of 56,307 well-defined affordance information annotations for 22,949 shapes covering 18 affordance classes and 23 semantic object categories.

9 PAPERS • 1 BENCHMARK

ARCTIC

ARCTIC (Articulated Objects in Free-form Hand Interaction)

ARCTIC is a dataset of free-form interactions of hands and articulated objects. ARCTIC has 1.2M images paired with accurate 3D meshes for both hands and for objects that move and deform over time. The dataset also provides hand-object contact information.

9 PAPERS • NO BENCHMARKS YET

BuildingNet

BuildingNet is a large-scale dataset of 3D building models whose exteriors are consistently labeled. The dataset consists on 513K annotated mesh primitives, grouped into 292K semantic part components across 2K building models. The dataset covers several building categories, such as houses, churches, skyscrapers, town halls, libraries, and castles.

9 PAPERS • 1 BENCHMARK

BEAT2 (BEAT-SMPLX-FLAME)

We propose EMAGE, a framework to generate full-body human gestures from audio and masked gestures, encompassing facial, local body, hands, and global movements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a new mesh-level holistic co-speech dataset. BEAT2 combines MoShed SMPLX body with FLAME head parameters and further refines the modeling of head, neck, and finger movements, offering a community-standardized, high-quality 3D motion captured dataset. EMAGE leverages masked body gesture priors during training to boost inference performance. It involves a Masked Audio Gesture Transformer, facilitating joint training on audio-to-gesture generation and masked gesture reconstruction to effectively encode audio and body gesture hints. Encoded body hints from masked gestures are then separately employed to generate facial and body movements. Moreover, EMAGE adaptively merges speech features from the audio's rhythm and content and utilizes four compositional VQ-VAEs to enh

8 PAPERS • 2 BENCHMARKS

Breaking Bad

Breaking Bad is a large-scale dataset of fractured objects. The dataset contains around 10k meshes from PartNet and Thingi10k. For each mesh, 20 fracture modes are pre-computed and then simulate 80 fractures from them, resulting in a total of 1M breakdown patterns. This dataset serves as a benchmark that enables the study of fractured object reassembly and presents new challenges for geometric shape understanding.

8 PAPERS • NO BENCHMARKS YET

CAPE (Clothed Auto Person Encoding)

The CAPE dataset is a 3D dynamic dataset of clothed humans, featuring:

7 PAPERS • 1 BENCHMARK

Hi4D

Hi4D contains 4D textured scans of 20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D contains rich interaction centric annotations in 2D and 3D alongside accurately registered parametric body models.

7 PAPERS • NO BENCHMARKS YET

MMBody

The MMBody dataset provides human body data with motion capture, GT mesh, Kinect RGBD, and millimeter wave sensor data. See homepage for more details.

6 PAPERS • NO BENCHMARKS YET

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 (BIWI 3D)

BIWI 3D corpus comprises a total of 1109 sentences uttered by 14 native English speakers (6 males and 8 females). A real time 3D scanner and a professional microphone were used to capture the facial movements and the speech of the speakers. The dense dynamic face scans were acquired at 25 frames per second and the RMS error in the 3D reconstruction is about 0.5 mm. In order to ease automatic speech segmentation, we carried out the recordings in a anechoic room, with walls covered by sound wave-absorbing materials.

5 PAPERS • 1 BENCHMARK

AMA (Articulated Mesh Animation)

Articulated Mesh Animation (AMA) is a real-world dataset containing 10 mesh sequences depicting 3 different humans performing various actions

4 PAPERS • NO BENCHMARKS YET

Building3D

Building3D is an urban-scale dataset consisting of more than 160 thousands buildings along with corresponding point clouds, mesh and wireframe models, covering 16 cities in Estonia about 998 Km2. Besides mesh models and real-world LiDAR point clouds, it also includes wireframe models.

4 PAPERS • NO BENCHMARKS YET

P2S (Points2Surf)

We introduced this dataset in Points2Surf, a method that turns point clouds into meshes.

4 PAPERS • NO BENCHMARKS YET

DAD-3DHeads

DAD-3DHeads (DAD-3DHeads dataset)

DAD-3DHeads dataset consists of 44,898 images collected from various sources (37,840 in the training set, 4,312 in the validation set, and 2,746 in the test set).

3 PAPERS • NO BENCHMARKS YET

HANDAL (HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions)

We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi-automated processing, allowing us to produce high-quality 3D annotations without crowd-sourcing. The dataset consists of 308k annotated image frames from 2.2k videos of 212 real-world objects in 17 categories. We focus on hardware and kitchen tool objects to facilitate research in practical scenarios in which a robot manipulator needs to interact with the environment beyond simple pushing or indiscriminate grasping. We outline the usefulness of our dataset for 6-DoF category-level pose+scale estimation and related tasks. We also provide 3D reconstructed meshes of all objects, and we outline s

3 PAPERS • NO BENCHMARKS YET

HOPE-Image (Household Objects for Pose Estimation)

The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.

3 PAPERS • NO BENCHMARKS YET

HOPE-Video (Household Objects for Pose Estimation)

The HOPE-Video dataset contains 10 video sequences (2038 frames) with 5-20 objects on a tabletop scene captured by a robot arm-mounted RealSense D415 RGBD camera. In each sequence, the camera is moved to capture multiple views of a set of objects in the robotic workspace. First COLMAP was applied to refine the camera poses (keyframes at 6~fps) provided by forward kinematics and RGB calibration from RealSense to Baxter's wrist camera. 3D dense point cloud was then generated via CascadeStereo (included for each sequence in 'scene.ply'). Ground truth poses for the HOPE objects models in the world coordinate system were annotated manually using the CascadeStereo point clouds. The following are provided for each frame:

2 PAPERS • NO BENCHMARKS YET

ObjectFolder Real

The ObjectFolder Real dataset contains multisensory data collected from 100 real-world household objects. The visual data for each object include three high-quality 3D meshes of different resolutions and an HD video recording of the object rotating in a lightbox; The acoustic data for each object include impact sound recordings recorded at 30–50 points of the object, each of which is 6s long and is accompanied by the coordinate of the striking location on the object mesh, ground-truth contact force profile, and the accompanying video for the impact. The tactile data for each object include tactile readings at the same 30–50 points of the object, with each tactile reading as a video of the tactile RGB images that record the entire gel deformation process and is accompanied by two videos of the contact process from an in-hand camera and a third-view camera.

2 PAPERS • NO BENCHMARKS YET

PsyMo (PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait)

Psychological trait estimation from external factors such as movement and appearance is a challenging and long-standing problem in psychology, and is principally based on the psychological theory of embodiment. To date, attempts to tackle this problem have utilized private small-scale datasets with intrusive body-attached sensors. Potential applications of an automated system for psychological trait estimation include estimation of occupational fatigue and psychology, and marketing and advertisement. In this work, we propose PsyMo (Psychological traits from Motion), a novel, multi-purpose and multi-modal dataset for exploring psychological cues manifested in walking patterns. We gathered walking sequences from 312 subjects in 7 different walking variations and 6 camera angles. In conjunction with walking sequences, participants filled in 6 psychological questionnaires, totalling 17 psychometric attributes related to personality, self-esteem, fatigue, aggressiveness and mental health. W

2 PAPERS • NO BENCHMARKS YET

Teeth3DS (Teeth3DS: a benchmark for teeth segmentation and labeling from intra-oral 3D scans)

Teeth3DS is the first public benchmark which has been created in the frame of the 3DTeethSeg 2022 MICCAI challenge to boost the research field and inspire the 3D vision research community to work on intra-oral 3D scans analysis such as teeth identification, segmentation, labeling, 3D modeling, and 3D reconstruction. Teeth3DS is made of 1800 intra-oral scans (23999 annotated teeth) collected from 900 patients covering the upper and lower jaws separately, acquired and validated by orthodontists/dental surgeons with more than 5 years of professional experience.

2 PAPERS • NO BENCHMARKS YET

ANIM (ANimals in Motion)

It comprises synthetic mesh sequences from Deformation Transfer for Triangle Meshes.

1 PAPER • 1 BENCHMARK

CMU Panoptic Dataset 2.0

The field of biomechanics is at a turning point, with marker-based motion capture set to be replaced by portable and inexpensive hardware, rapidly improving markerless tracking algorithms, and open datasets that will turn these new technologies into field-wide team projects. To expedite progress in this direction, we have collected the CMU Panoptic Dataset 2.0, which contains 86 subjects captured with 140 VGA cameras, 31 HD cameras, and 15 IMUs, performing on average 6.5 min of activities, including range of motion activities and tasks of daily living.

1 PAPER • NO BENCHMARKS YET

DermSynth3D

DermSynth3D (3DBodyTex.DermSynth3D)

A dataset of 100K synthetic images of skin lesions, ground-truth (GT) segmentations of lesions and healthy skin, GT segmentations of seven body parts (head, torso, hips, legs, feet, arms and hands), and GT binary masks of non-skin regions in the texture maps of 215 scans from the 3DBodyTex.v1 dataset [2], [3] created using the framework described in [1]. The dataset is primarily intended to enable the development of skin lesion analysis methods. Synthetic image creation consisted of two main steps. First, skin lesions from the Fitzpatrick 17k dataset were blended onto skin regions of high-resolution three-dimensional human scans from the 3DBodyTex dataset [2], [3]. Second, two-dimensional renders of the modified scans were generated.

1 PAPER • NO BENCHMARKS YET

DrivAerNet

DrivAerNet (A Parametric Car Dataset for Data-driven Aerodynamic Design and Graph-Based Drag Prediction)

DrivAerNet is a large-scale, high-fidelity CFD dataset of 3D industry-standard car shapes designed for data-driven aerodynamic design. It comprises 4000 high-quality 3D car meshes and their corresponding aerodynamic performance coefficients, alongside full 3D flow field information.

1 PAPER • NO BENCHMARKS YET

Fast Linking Numbers of Loopy Structures Dataset

1 PAPER • NO BENCHMARKS YET

KITTI-6DoF (KITTI-Six Degrees Of Freedom)

KITTI-6DoF is a dataset that contains annotations for the 6DoF estimation task for 5 object categories on 7,481 frames.

1 PAPER • NO BENCHMARKS YET

PLAD (Point Line and Depth dataset)

PLAD is a dataset where sparse depth is provided by line-based visual SLAM to verify StructMDC.

1 PAPER • 1 BENCHMARK

PaintNet

PaintNet is a dataset for learning robotic spray painting of free-form 3D objects. PaintNet includes more than 800 object meshes and the associated painting strokes collected in a real industrial setting.

1 PAPER • NO BENCHMARKS YET

RTB (Robot Tracking Benchmark)

The Robot Tracking Benchmark (RTB) is a synthetic dataset that facilitates the quantitative evaluation of 3D tracking algorithms for multi-body objects. It was created using the procedural rendering pipeline BlenderProc. The dataset contains photo-realistic sequences with HDRi lighting and physically-based materials. Perfect ground truth annotations for camera and robot trajectories are provided in the BOP format. Many physical effects, such as motion blur, rolling shutter, and camera shaking, are accurately modeled to reflect real-world conditions. For each frame, four depth qualities exist to simulate sensors with different characteristics. While the first quality provides perfect ground truth, the second considers measurements with the distance-dependent noise characteristics of the Azure Kinect time-of-flight sensor. Finally, for the third and fourth quality, two stereo RGB images with and without a pattern from a simulated dot projector were rendered. Depth images were then recons

1 PAPER • 1 BENCHMARK

Robot@Home2 (Robot@Home2, a robotic dataset of home environments)

Robot@Home2, is an enhanced version aimed at improving usability and functionality for developing and testing mobile robotics and computer vision algorithms. Robot@Home2 consists of three main components. Firstly, a relational database that states the contextual information and data links, compatible with Standard Query Language. Secondly,a Python package for managing the database, including downloading, querying, and interfacing functions. Finally, learning resources in the form of Jupyter notebooks, runnable locally or on the Google Colab platform, enabling users to explore the dataset without local installations. These freely available tools are expected to enhance the ease of exploiting the Robot@Home dataset and accelerate research in computer vision and robotics.

1 PAPER • NO BENCHMARKS YET

SELTO Dataset

A Benchmark Dataset for Deep Learning-based Methods for 3D Topology Optimization.

1 PAPER • NO BENCHMARKS YET

TDMD

TDMD contains eight reference DCM objects with six typical distortions. Using processed video sequences (PVS) derived from the DCM, the authors conducted a large-scale subjective experiment that resulted in 303 distorted DCM samples with mean opinion scores, making the TDMD the largest available DCM database to our knowledge.

1 PAPER • NO BENCHMARKS YET

The RBO Dataset of Articulated Objects and Interactions

The RBO dataset of articulated objects and interactions is a collection of 358 RGB-D video sequences (67:18 minutes) of humans manipulating 14 articulated objects under varying conditions (light, perspective, background, interaction). All sequences are annotated with ground truth of the poses of the rigid parts and the kinematic state of the articulated object (joint states) obtained with a motion capture system. We also provide complete kinematic models of these objects (kinematic structure and three-dimensional textured shape models). In 78 sequences the contact wrenches during the manipulation are also provided.

1 PAPER • NO BENCHMARKS YET

TransProteus

The dataset contains procedurally generated images of transparent vessels containing liquid and objects . The data for each image includes segmentation maps, 3d depth maps, and normal maps of of the liquid or object inside the transparent vessel, and the vessel. In addition, the properties of the materials inside the containers are given(color/transparency/roughness/metalness). In addition, a natural image benchmark for the 3d/depth estimation of objects inside transparent containers is supplied. 3d models of the objects (GTLF) are also supplied.

1 PAPER • 1 BENCHMARK

Voxceleb-3D

A dataset for voice and 3D face structure study. It contains about 1.4K identities with their 3D face models and voice data. 3D face models are fitted from VGGFace using BFM 3D models, and voice data are processed from Voxceleb

1 PAPER • 1 BENCHMARK

Datasets

53 dataset results for 3d meshes