Open Vocabulary Object Detection

56 papers with code • 4 benchmarks • 6 datasets

Open-vocabulary detection (OVD) aims to generalize beyond the limited number of base classes labeled during the training phase. The goal is to detect novel classes defined by an unbounded (open) vocabulary at inference.

Benchmarks

Add a Result

These leaderboards are used to track progress in Open Vocabulary Object Detection

Dataset	Best Model	Compare
MSCOCO	Cooperative Foundational Models	See all
LVIS v1.0	DITO	See all
OpenImages-v4	Object-Centric-OVD	See all
Objects365	Object-Centric-OVD	See all

Libraries

Use these libraries to find Open Vocabulary Object Detection models and implementations

faceonlive/ai-research

2 papers

144

om-ai-lab/OmDet

2 papers

Datasets

Subtasks

Open Vocabulary Attribute Detection

Most implemented papers

Most implemented Social Latest No code

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

tensorflow/tpu • • ICLR 2022

On COCO, ViLD outperforms the previous state-of-the-art by 4. 8 on novel AP and 11. 4 on overall AP.

Paper
Code

PointCLIP: Point Cloud Understanding by CLIP

zrrskywalker/pointclip • • CVPR 2022

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

Paper
Code

Simple Open-Vocabulary Object Detection with Vision Transformers

google-research/scenic • • 12 May 2022

Combining simple architectures with large-scale pre-training has led to massive improvements in image classification.

Paper
Code

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

peixianchen/medet • • 22 Jun 2022

Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary.

Paper
Code

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning

yangyangyang127/pointclip_v2 • • ICCV 2023

In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.

Paper
Code

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

google-research/google-research • • CVPR 2023

We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) - a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection.

Paper
Code

Described Object Detection: Liberating Object Detection with Flexible Expressions

charles-xie/awesome-described-object-detection • • NeurIPS 2023

In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object.

Paper
Code