Instance Segmentation
964 papers with code • 25 benchmarks • 82 datasets
Instance Segmentation is a computer vision task that involves identifying and separating individual objects within an image, including detecting the boundaries of each object and assigning a unique label to each object. The goal of instance segmentation is to produce a pixel-wise segmentation map of the image, where each pixel is assigned to a specific object instance.
Image Credit: Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers, CVPR'21
Libraries
Use these libraries to find Instance Segmentation models and implementationsDatasets
Subtasks
- Referring Expression Segmentation
- 3D Instance Segmentation
- Real-time Instance Segmentation
- Unsupervised Object Segmentation
- Unsupervised Object Segmentation
- Amodal Instance Segmentation
- Box-supervised Instance Segmentation
- Image-level Supervised Instance Segmentation
- Unseen Object Instance Segmentation
- 3D Semantic Instance Segmentation
- Open-World Instance Segmentation
- Human Instance Segmentation
- One-Shot Instance Segmentation
- Semi-Supervised Person Instance Segmentation
- Point-Supervised Instance Segmentation
- Solar Cell Segmentation
Most implemented papers
Mask R-CNN
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
MMDetection: Open MMLab Detection Toolbox and Benchmark
In this paper, we introduce the various features of this toolbox.
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
YOLACT: Real-time Instance Segmentation
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
Deep High-Resolution Representation Learning for Visual Recognition
High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.
Deep High-Resolution Representation Learning for Human Pose Estimation
We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding.
YOLACT++: Better Real-time Instance Segmentation
Then we produce instance masks by linearly combining the prototypes with the mask coefficients.
ResNeSt: Split-Attention Networks
It is well known that featuremap attention and multi-path representation are important for visual recognition.
Res2Net: A New Multi-scale Backbone Architecture
We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e. g., CIFAR-100 and ImageNet.