Text-based Image Editing
21 papers with code • 1 benchmarks • 2 datasets
Libraries
Use these libraries to find Text-based Image Editing models and implementationsMost implemented papers
Prompt-to-Prompt Image Editing with Cross Attention Control
Editing is challenging for these generative models, since an innate property of an editing technique is to preserve most of the original image, while in the text-based models, even a small modification of the text prompt often leads to a completely different outcome.
InstructPix2Pix: Learning to Follow Image Editing Instructions
We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image.
Null-text Inversion for Editing Real Images using Guided Diffusion Models
Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model
In this work, we expand the existing single-flow diffusion pipeline into a multi-task multimodal network, dubbed Versatile Diffusion (VD), that handles multiple flows of text-to-image, image-to-text, and variations in one unified model.
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts.
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results.
EDICT: Exact Diffusion Inversion via Coupled Transformations
EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion.
Zero-shot Image-to-Image Translation
However, it is still challenging to directly apply these models for editing real images for two reasons.
Erasing Concepts from Diffusion Models
We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher.
LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images
We propose an automated algorithm to stress-test a trained visual model by generating language-guided counterfactual test images (LANCE).