1 intro
code: https://github.com/ChengHan111/E2VPT
-
task: parameter-efficient learning
-
method: effective and efficient visual prompt tuning (E^2VPT)
three types of existing parameter-efficient learning methods:
-
partial tuning: finetune part of the backbone e.g., the cls head or last layers
-
extra module: insert learnable bias or additional adapters
-
Prompt tuning: add prompt tokens but without changing or fine-tuning backbone
![](https://img.haomeiwen.com/i9933353/bfea5c5823ba5a5d.png)
limitations of existing work:
1) 现有方法没有改变transformer最核心的key-value操作;
2) 现有方法还是不够极致节省计算量
2 this paper
main idea:
1) prompt:visual tokens, + add learnable tokens into key-value prompts
2) prune:redunce the number of learnable parameters by pruning unnecessary prompts
![](https://img.haomeiwen.com/i9933353/42e712aca9d1b40c.png)
- 文章做法:对visual prompt和key-value prompt都进行efficient tuning;
对比的baselines & exp
![](https://img.haomeiwen.com/i9933353/c47db96f7f8d8d82.png)
网友评论