Composed Image Retrieval (CoIR)
12 papers with code • 1 benchmarks • 5 datasets
Composed Image Retrieval (CoIR) is the task involves retrieving images from a large database based on a query composed of multiple elements, such as text, images, and sketches. The goal is to develop algorithms that can understand and combine multiple sources of information to accurately retrieve images that match the query, extending the user’s expression ability.
Most implemented papers
Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
We demonstrate that with a relatively simple architecture, CIRPLANT outperforms existing methods on open-domain images, while matching state-of-the-art accuracy on the existing narrow datasets, such as fashion.
Zero-Shot Composed Image Retrieval with Textual Inversion
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption that describes the difference between the two images.
Effective Conditioned and Composed Image Retrieval Combining CLIP-Based Features
the visual content of the query image.
Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features
The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder.
Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder
An alternative approach is to allow interactions between the query and every possible candidate, i. e., reference-text-candidate triplets, and pick the best from the entire set.
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval
Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.
Data Roaming and Quality Assessment for Composed Image Retrieval
To address these shortcomings, we introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones.
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion
This paper proposes a novel diffusion-based model, CompoDiff, for solving zero-shot Composed Image Retrieval (ZS-CIR) with latent diffusion.
Bi-directional Training for Composed Image Retrieval via Text Prompt Learning
Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes.