controllable image captioning
6 papers with code • 0 benchmarks • 0 datasets
generate image captions conditioned on control signals
Benchmarks
These leaderboards are used to track progress in controllable image captioning
Most implemented papers
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions
Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior.
Length-Controllable Image Captioning
We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.
Language-Driven Region Pointer Advancement for Controllable Image Captioning
A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer.
Human-like Controllable Image Captioning with Verb-specific Semantic Roles
However, we argue that almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal: 1) Event-compatible: all visual contents referred to in a single sentence should be compatible with the described activity.
CLID: Controlled-Length Image Descriptions with Limited Data
Controllable image captioning models generate human-like image descriptions, enabling some kind of control over the generated captions.
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.