Text based Person Retrieval
24 papers with code • 3 benchmarks • 3 datasets
Most implemented papers
Dual-Path Convolutional Image-Text Embeddings with Instance Loss
In this paper, we propose a new system to discriminatively embed the image and text to a shared visual-textual space.
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.
Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search
Secondly, a BERT with locality-constrained attention is proposed to obtain representations of descriptions at different scales.
Learning Granularity-Unified Representations for Text-to-Image Person Re-identification
In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance.
Person Search with Natural Language Description
Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance.
Deep Cross-Modal Projection Learning for Image-Text Matching
The key point of image-text matching is how to accurately measure the similarity between visual and textual inputs.
TIPCB: A Simple but Effective Part-based Convolutional Baseline for Text-based Person Search
Text-based person search is a sub-task in the field of image retrieval, which aims to retrieve target person images according to a given textual description.
Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification
Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features.
DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval
Many previous methods on text-based person retrieval tasks are devoted to learning a latent common space mapping, with the purpose of extracting modality-invariant features from both visual and textual modality.
Text-based Person Search in Full Images via Semantic-Driven Proposal Generation
Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images.