CARLA (CAR Learning to Act) is an open simulator for urban driving, developed as an open-source layer over Unreal Engine 4. Technically, it operates similarly to, as an open source layer over Unreal Engine 4 that provides sensors in the form of RGB cameras (with customizable positions), ground truth depth maps, ground truth semantic segmentation maps with 12 semantic classes designed for driving (road, lane marking, traffic sign, sidewalk and so on), bounding boxes for dynamic objects in the environment, and measurements of the agent itself (vehicle location and orientation).
1,055 PAPERS • 3 BENCHMARKS
AirSim is a simulator for drones, cars and more, built on Unreal Engine. It is open-source, cross platform, and supports software-in-the-loop simulation with popular flight controllers such as PX4 & ArduPilot and hardware-in-loop with PX4 for physically and visually realistic simulations. It is developed as an Unreal plugin that can simply be dropped into any Unreal environment. Similarly, there exists an experimental version for a Unity plugin.
240 PAPERS • NO BENCHMARKS YET
AI2-Thor is an interactive environment for embodied AI. It contains four types of scenes, including kitchen, living room, bedroom and bathroom, and each scene includes 30 rooms, where each room is unique in terms of furniture placement and item types. There are over 2000 unique objects for AI agents to interact with.
186 PAPERS • 1 BENCHMARK
TORCS (The Open Racing Car Simulator) is a driving simulator. It is capable of simulating the essential elements of vehicular dynamics such as mass, rotational inertia, collision, mechanics of suspensions, links and differentials, friction and aerodynamics. Physics simulation is simplified and is carried out through Euler integration of differential equations at a temporal discretization level of 0.002 seconds. The rendering pipeline is lightweight and based on OpenGL that can be turned off for faster training. TORCS offers a large variety of tracks and cars as free assets. It also provides a number of programmed robot cars with different levels of performance that can be used to benchmark the performance of human players and software driving agents. TORCS was built with the goal of developing Artificial Intelligence for vehicular control and has been used extensively by the machine learning community ever since its inception.
89 PAPERS • NO BENCHMARKS YET
RLBench is an ambitious large-scale benchmark and learning environment designed to facilitate research in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.
87 PAPERS • 3 BENCHMARKS
The INTERACTION dataset contains naturalistic motions of various traffic participants in a variety of highly interactive driving scenarios from different countries. The dataset can serve for many behavior-related research areas, such as
71 PAPERS • 1 BENCHMARK
ManiSkill2 is the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. It includes 20 manipulation task families with 2000+ object models and 4M+ demonstration frames, which cover stationary/mobile-base, single/dual-arm, and rigid/soft-body manipulation tasks with 2D/3D input data simulated by fully dynamic engines.
22 PAPERS • NO BENCHMARKS YET
The Collaborative Drawing game (CoDraw) dataset contains ~10K dialogs consisting of ~138K messages exchanged between human players in the CoDraw game. The game involves two players: a Teller and a Drawer. The Teller sees an abstract scene containing multiple clip art pieces in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip art pieces. The two players communicate with each other using natural language.
12 PAPERS • NO BENCHMARKS YET
CHALET is a 3D house simulator with support for navigation and manipulation. Unlike existing systems, CHALET supports both a wide range of object manipulation, as well as supporting complex environemnt layouts consisting of multiple rooms. The range of object manipulations includes the ability to pick up and place objects, toggle the state of objects like taps or televesions, open or close containers, and insert or remove objects from these containers. In addition, the simulator comes with 58 rooms that can be combined to create houses, including 10 default house layouts. CHALET is therefore suitable for setting up challenging environments for various AI tasks that require complex language understanding and planning, such as navigation, manipulation, instruction following, and interactive question answering.
9 PAPERS • NO BENCHMARKS YET
CubiCasa5K is a large-scale floorplan image dataset containing 5000 samples annotated into over 80 floorplan object categories. The dataset annotations are performed in a dense and versatile manner by using polygons for separating the different objects.
8 PAPERS • NO BENCHMARKS YET
The Atari Grand Challenge dataset is a large dataset of human Atari 2600 replays. It consists of replays for 5 different games: * Space Invaders (445 episodes, 2M frames) * Q*bert (659 episodes, 1.6M frames) * Ms.Pacman (384 episodes, 1.7M frames) * Video Pinball (211 episodes, 1.5M frames) * Montezuma’s revenge (668 episodes, 2.7M frames)
7 PAPERS • NO BENCHMARKS YET
Atari-HEAD is a dataset of human actions and eye movements recorded while playing Atari videos games. For every game frame, its corresponding image frame, the human keystroke action, the reaction time to make that action, the gaze positions, and immediate reward returned by the environment were recorded. The gaze data was recorded using an EyeLink 1000 eye tracker at 1000Hz. The human subjects are amateur players who are familiar with the games. The human subjects were only allowed to play for 15 minutes and were required to rest for at least 15 minutes before the next trial. Data was collected from 4 subjects, 16 games, 175 15-minute trials, and a total of 2.97 million frames/demonstrations.
Simitate is a hybrid benchmarking suite targeting the evaluation of approaches for imitation learning. It consists on a dataset containing 1938 sequences where humans perform daily activities in a realistic environment. The dataset is strongly coupled with an integration into a simulator. RGB and depth streams with a resolution of 960×540 at 30Hz and accurate ground truth poses for the demonstrator's hand, as well as the object in 6 DOF at 120Hz are provided. Along with the dataset the 3D model of the used environment and labelled object images are also provided.
5 PAPERS • NO BENCHMARKS YET
MineRLis an imitation learning dataset with over 60 million frames of recorded human player data. The dataset includes a set of tasks which highlights many of the hardest problems in modern-day Reinforcement Learning: sparse rewards and hierarchical policies.
3 PAPERS • NO BENCHMARKS YET
This dataset contains a large set (~3.2 Million) of high quality expert trajectories generated from a geometrically consist hybrid planner in a wide variety of environment (~575,000 environments). We created this dataset to explore the capabilities of neural networks to learn complex robotic motion, mimicking a traditional planner.
StarData is a StarCraft: Brood War replay dataset, with 65,646 games. The full dataset after compression is 365 GB, 1535 million frames, and 496 million player actions. The entire frame data was dumped out at 8 frames per second.
2 PAPERS • NO BENCHMARKS YET
A benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure.
1 PAPER • NO BENCHMARKS YET
Texygen is a benchmarking platform to support research on open-domain text generation models. Texygen has not only implemented a majority of text generation models, but also covered a set of metrics that evaluate the diversity, the quality and the consistency of the generated texts. The Texygen platform could help standardize the research on text generation and facilitate the sharing of fine-tuned open-source implementations among researchers for their work. As a consequence, this would help in improving the reproductivity and reliability of future research work in text generation.