LiT：锁定图像-调整文本的Zero-Shot迁移

作者: Valar_Morghulis | 来源:发表于2023-02-27 09:15 被阅读0次

HTML 图像实例
Learning deep representations of
[Paper Weekly]风格迁移算法：A Neural Al
项目三：基于内容的图像检索
SUPERVISION EXISTS EVERYWHERE: A
神经风格迁移
Visio技巧
TensorFlow 图像处理
QLabel，QRadioButton，QCheckBox
PS学习笔记：基础篇

LiT: Zero-Shot Transfer with Locked-image text Tuning

Nov 2021

CVPR 2022

Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer

[Google Research, Brain Team, Zurich]

https://arxiv.org/abs/2111.07991

https://openaccess.thecvf.com/content/CVPR2022/html/Zhai_LiT_Zero-Shot_Transfer_With_Locked-Image_Text_Tuning_CVPR_2022_paper.html

https://github.com/google-research/vision_transformer#lit-models

本文提出了对比调整，这是一种简单的方法，它使用对比训练来对齐图像和文本模型，同时仍然利用它们的预训练。在我们的实证研究中，我们发现锁定的预训练图像模型与未锁定的文本模型效果最佳。我们将这种对比调整的实例称为“锁定图像调整”（Locked image tuning，LiT），它只是教导文本模型从预先训练的图像模型中读出新任务的良好表示。LiT模型获得了零样本迁移到新视觉任务（如图像分类或检索）的能力。建议的LiT广泛适用；它使用三种不同的图像文本数据集，通过多种预训练方法（有监督和无监督）和多种架构（ResNet、Vision Transformers和MLP Mixer）可靠地工作。使用基于Transformer的预训练ViT-g/14模型，LiT模型在ImageNet测试集上实现了85.2%的零样本迁移精度，在具有挑战性的分布外ObjectNet测试集中实现了82.5%。

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text models while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image models with unlocked text models work best. We call this instance of contrastive-tuning "Locked-image Tuning" (LiT), which just teaches a text model to read out good representations from a pre-trained image model for new tasks. A LiT model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT is widely applicable; it works reliably with multiple pre-training methods (supervised and unsupervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT model achieves 85.2% zero-shot transfer accuracy on the ImageNet test set, and 82.5% on the challenging out-of-distribution ObjectNet test set.