Open Vocabulary Attribute Detection

11 papers with code • 2 benchmarks • 1 datasets

Open-Vocabulary Attribute Detection (OVAD) is a task that aims to detect and recognize an open set of objects and their associated attributes in an image. The objects and attributes are defined by text queries during inference, without prior knowledge of the tested classes during training.

Benchmarks

Add a Result

These leaderboards are used to track progress in Open Vocabulary Attribute Detection

Trend	Dataset	Best Model	Paper	Code	Compare
	OVAD-Box benchmark	X-VLM			See all
	OVAD benchmark	OvarNet (ViT-B16)			See all

Libraries

Use these libraries to find Open Vocabulary Attribute Detection models and implementations

salesforce/lavis

3 papers

8,724

huggingface/transformers

2 papers

124,984

mlfoundations/open_clip

2 papers

8,439

towhee-io/towhee

2 papers

2,991

See all 5 libraries.

Datasets

OVAD benchmark

Most implemented papers

Most implemented Social Latest No code

Learning Transferable Visual Models From Natural Language Supervision

openai/CLIP • • 26 Feb 2021

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.

Paper
Code

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

salesforce/lavis • • 30 Jan 2023

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.

Paper
Code

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

salesforce/lavis • • 28 Jan 2022

Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision.

Paper
Code

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

salesforce/lavis • • NeurIPS 2021

Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) and word tokens.

Paper
Code

Reproducible scaling laws for contrastive language-image learning

laion-ai/scaling-laws-openclip • • CVPR 2023

To address these limitations, we investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.

Paper
Code

Open-Vocabulary Object Detection Using Captions

alirezazareian/ovr-cnn • • CVPR 2021

Weakly supervised and zero-shot learning techniques have been explored to scale object detectors to more categories with less supervision, but they have not been as successful and widely adopted as supervised models.

Paper
Code

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

zengyan-97/x-vlm • • 16 Nov 2021

Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts.

Paper
Code

Localized Vision-Language Matching for Open-vocabulary Object Detection

lmb-freiburg/locov • • 12 May 2022

In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes.

Paper
Code