LLaMA-Adapter：零初始化注意的语言模型的高效微调

作者: Valar_Morghulis | 来源:发表于2023-03-29 10:17 被阅读0次

ULMFiT-用于文本分类的通用语言模型微调
NLP论文解读：无需模板且高效的语言微调模型（下）
NLP论文解读：无需模板且高效的语言微调模型（上）
微调模型
BERT 详解（五）
BERT微调模型
深度学习_Softmax从零开始
一种解决bert长文本匹配的方法
LIST: LITE SELF-TRAINING MAKES E
GPT Understands, Too

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Mar 2023

Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao

[Shanghai Artificial Intelligence Laboratory, CUHK MMLab, University of California, Los Angeles]

https://arxiv.org/abs/2303.16199

https://github.com/ZrrSkywalker/LLaMA-Adapter

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA.