Multi-label zero-shot learning
12 papers with code • 3 benchmarks • 2 datasets
The goal of multi-label classification task is to predict a set of labels in an image. As an extension of zero-shot learning (ZSL), multi-label zero-shot learning (ML-ZSL) is developed to identify multiple seen and unseen labels in an image.
Most implemented papers
Zero-Shot Learning by Convex Combination of Semantic Embeddings
In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.
Label-Embedding for Image Classification
Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce.
Multi-Label Zero-Shot Learning with Structured Knowledge Graphs
In this paper, we propose a novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance.
Zero-shot Learning for Audio-based Music Classification and Tagging
Audio-based music classification and tagging is typically based on categorical supervised learning with a fixed set of labels.
A Shared Multi-Attention Framework for Multi-Label Zero-Shot Learning
Therefore, instead of generating attentions for unseen labels which have unknown behaviors and could focus on irrelevant regions due to the lack of any training sample, we let the unseen labels select among a set of shared attentions which are trained to be label-agnostic and to focus on only relevant/foreground regions through our novel loss.
Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations
We study the problem of multi-label zero-shot recognition in which labels are in the form of human-object interactions (combinations of actions on objects), each image may contain multiple interactions and some interactions do not have training images.
Generative Multi-Label Zero-Shot Learning
Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge.
Semantic Diversity Learning for Zero-Shot Multi-label Classification
We argue that using a single embedding vector to represent an image, as commonly practiced, is not sufficient to rank both relevant seen and unseen labels accurately.
Contrastive Language-Image Pre-training for the Italian Language
CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts.
Discriminative Region-based Multi-Label Zero-Shot Learning
We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes.