Visual Prompting
32 papers with code • 0 benchmarks • 0 datasets
Visual Prompting is the task of streamlining computer vision processes by harnessing the power of prompts, inspired by the breakthroughs of text prompting in NLP. This innovative approach involves using a few visual prompts to swiftly convert an unlabeled dataset into a deployed model, significantly reducing development time for both individual projects and enterprise solutions.
Benchmarks
These leaderboards are used to track progress in Visual Prompting
Most implemented papers
Segment Anything
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation.
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V.
Visual In-Context Prompting
In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.
Visual Prompting for Adversarial Robustness
In this work, we leverage visual prompting (VP) to improve adversarial robustness of a fixed, pre-trained model at testing time.
Explicit Visual Prompting for Universal Foreground Segmentations
We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP and propose a new visual prompting model, named Explicit Visual Prompting (EVP).
Exploring Visual Prompts for Adapting Large-Scale Models
The surprising effectiveness of visual prompting provides a new perspective on adapting pre-trained models in vision.
Visual Prompting via Image Inpainting
How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification?
Understanding and Improving Visual Prompting: A Label-Mapping Perspective
As highlighted below, we show that when reprogramming an ImageNet-pretrained ResNet-18 to 13 target tasks, our method outperforms baselines by a substantial margin, e. g., 7. 9% and 6. 7% accuracy improvements in transfer learning to the target Flowers102 and CIFAR100 datasets.
Unleashing the Power of Visual Prompting At the Pixel Level
This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks.
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
In this paper, we study the problem of temporal video grounding (TVG), which aims to predict the starting/ending time points of moments described by a text sentence within a long untrimmed video.