Open Vocabulary Semantic Segmentation
37 papers with code • 9 benchmarks • 4 datasets
Open-vocabulary semantic segmentation models aim to accurately assign a semantic label to each pixel in an image from a set of arbitrary open-vocabulary texts.
Most implemented papers
Side Adapter Network for Open-Vocabulary Semantic Segmentation
A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Open-vocabulary semantic segmentation presents the challenge of labeling each pixel within an image based on a wide range of text descriptions.
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model
However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks
Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal large vision model that has demonstrated significant benefits for downstream tasks, including many zero-shot learning and text-guided vision tasks.
Panoptic Vision-Language Feature Fields
In this paper, we propose to the best of our knowledge the first algorithm for open-vocabulary panoptic segmentation in 3D scenes.
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
However, existing approaches often rely on impractical supervised pre-training or access to additional pre-trained networks.
Decoupling Zero-Shot Semantic Segmentation
2) a zero-shot classification task on segments.
Open-Vocabulary Universal Image Segmentation with MaskCLIP
In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions.
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.