Zero-Shot Transfer Image Classification

16 papers with code • 16 benchmarks • 8 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Zero-Shot Transfer Image Classification

Dataset	Best Model	Compare
ImageNet	M2-Encoder	See all
ImageNet V2	BASIC (Lion)	See all
ImageNet-A	CoCa	See all
ImageNet-R	BASIC (Lion)	See all
ObjectNet	LiT-22B	See all
ImageNet-Sketch	CoCa	See all
Food-101	MAWS (ViT-2B)	See all
SUN	EVA-CLIP-18B	See all
aYahoo	CLIP	See all
CN-ImageNet	InternVL-C	See all
ImageNet ReaL	LiT-tuning	See all
ImageNet-S	PaLI	See all
CN-ImageNet-Sketch	AltCLIP	See all
CN-ImageNet-A	AltCLIP	See all
CN-ImageNet-R	AltCLIP	See all
CN-ImageNet V2	AltCLIP	See all

Show all 16 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Zero-Shot Transfer Image Classification models and implementations

mlfoundations/open_clip

3 papers

8,429

google-research/big_vision

2 papers

1,552

Datasets

Most implemented papers

Most implemented Social Latest No code

Learning Transferable Visual Models From Natural Language Supervision

openai/CLIP • • 26 Feb 2021

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories.

Paper
Code

CoCa: Contrastive Captioners are Image-Text Foundation Models

mlfoundations/open_clip • • 4 May 2022

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

Paper
Code

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

kakaobrain/coyo-dataset • • 11 Feb 2021

In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset.

Paper
Code

LiT: Zero-Shot Transfer with Locked-image text Tuning

google-research/vision_transformer • • CVPR 2022

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training.

Paper
Code

EVA-CLIP: Improved Training Techniques for CLIP at Scale

baaivision/eva • • 27 Mar 2023

Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs.

Paper
Code

Your Diffusion Model is Secretly a Zero-Shot Classifier

diffusion-classifier/diffusion-classifier • • ICCV 2023

Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models.

Paper
Code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

opengvlab/internvl • • 21 Dec 2023

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

Paper
Code

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

baaivision/eva • • 6 Feb 2024

Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models.

Paper
Code

Florence: A New Foundation Model for Computer Vision

microsoft/unicl • • 22 Nov 2021

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Paper
Code

PaLI: A Jointly-Scaled Multilingual Language-Image Model

google-research/big_vision • • 14 Sep 2022

PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages.

Paper
Code

Zero-Shot Transfer Image Classification

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result