Computer Vision

controllable image captioning

6 papers with code • 0 benchmarks • 0 datasets

generate image captions conditioned on control signals

Benchmarks

Add a Result

These leaderboards are used to track progress in controllable image captioning

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

aimagelab/show-control-and-tell • • CVPR 2019

Current captioning approaches can describe images using black-box architectures whose behavior is hardly controllable and explainable from the exterior.

Paper
Code

Length-Controllable Image Captioning

bearcatt/LaBERT • • ECCV 2020

We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.

Paper
Code

Language-Driven Region Pointer Advancement for Controllable Image Captioning

AnnikaLindh/Controllable_Region_Pointer_Advancement • • COLING 2020

A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer.

Paper
Code

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

mad-red/VSR-guided-CIC • • CVPR 2021

However, we argue that almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal: 1) Event-compatible: all visual contents referred to in a single sentence should be compatible with the described activity.

Paper
Code

CLID: Controlled-Length Image Descriptions with Limited Data

eladhi/clid • • 27 Nov 2022

Controllable image captioning models generate human-like image descriptions, enabling some kind of control over the generated captions.

Paper
Code

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

ttengwang/caption-anything • • 4 May 2023

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.

Paper
Code

controllable image captioning

Benchmarks Add a Result

Most implemented papers

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Length-Controllable Image Captioning

Language-Driven Region Pointer Advancement for Controllable Image Captioning

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

CLID: Controlled-Length Image Descriptions with Limited Data

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Content

Benchmarks

Add a Result