Visual Prompt Tuning
19 papers with code • 4 benchmarks • 0 datasets
Visual Prompt Tuning(VPT) only introduces a small amount of task-specific learnable parameters into the input space while freezing the entire pre-trained Transformer backbone during downstream training. In practice, these additional parameters are simply prepended into the input sequence of each Transformer layer and learned together with a linear head during fine-tuning. VPT is especially effective in the low-data regime, and maintains its advantage across data scales. Finally, VPT is competitive for a range of Transformer scales and designs (ViTBase/Large/Huge, Swin). Put together, the results suggest that VPT is one of the most effective ways of adapting ever-growing vision backbones.
Most implemented papers
Visual Prompt Tuning
The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning.
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
To conquer this limitation, we propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models.
Understanding Zero-Shot Adversarial Robustness for Large-Scale Models
We apply this training loss to two adaption methods, model finetuning and visual prompt tuning.
Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning (CAVPT) scheme is further proposed in our DPT, where the class-aware visual prompt is generated dynamically by performing the cross attention between text prompts features and image patch token embeddings to encode both the downstream task-related information and visual instance information.
Visual Prompt Tuning for Generative Transfer Learning
We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.
Unified Vision and Language Prompt Learning
Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP.
Multitask Vision-Language Prompt Tuning
Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning.
Improving Visual Prompt Tuning for Self-supervised Vision Transformers
Visual Prompt Tuning (VPT) is an effective tuning method for adapting pretrained Vision Transformers (ViTs) to downstream tasks.
TransTIC: Transferring Transformer-based Image Compression from Human Perception to Machine Perception
This work aims for transferring a Transformer-based image compression codec from human perception to machine perception without fine-tuning the codec.
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view.