Image Retrieval
666 papers with code • 54 benchmarks • 75 datasets
Image Retrieval is a fundamental and long-standing computer vision task that involves finding images similar to a provided query from a large database. It's often considered as a form of fine-grained, instance-level classification. Not just integral to image recognition alongside classification and detection, it also holds substantial business value by helping users discover images aligning with their interests or requirements, guided by visual similarity or other parameters.
( Image credit: DELF )
Libraries
Use these libraries to find Image Retrieval models and implementationsDatasets
Subtasks
Most implemented papers
Emerging Properties in Self-Supervised Vision Transformers
In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).
VGGFace2: A dataset for recognising faces across pose and age
The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.
NetVLAD: CNN architecture for weakly supervised place recognition
We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph.
Fine-tuning CNN Image Retrieval with No Human Annotation
We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval.
Large-Scale Image Retrieval with Attentive Deep Local Features
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature).
Circle Loss: A Unified Perspective of Pair Similarity Optimization
This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity $s_p$ and minimize the between-class similarity $s_n$.
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models.
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.
DINOv2: Learning Robust Visual Features without Supervision
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
We present a new technique for learning visual-semantic embeddings for cross-modal retrieval.