DreamBooth：主题驱动生成的文本到图像扩散模型微调

作者: Valar_Morghulis | 来源:发表于2023-02-10 11:01 被阅读0次

Text-Guided Synthesis of Artisti
GLIDE: Towards Photorealistic Im
《动手学深度学习》第七天2020-02-20
检索增强扩散模型的文本引导艺术图像合成
深度学习的可解释性|Global Average Pooling
ULMFiT-用于文本分类的通用语言模型微调
[Deep-Learning-with-Python]使用LST
word2vec学习笔记之概述
如何使用ABBYY FineReader PDF 15来制作双层
Hugging face 模型微调系列2—— 实战transfo

DreamBooth

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

25 Aug 2022

作者：Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman

原文：https://arxiv.org/abs/2208.12242

https://paperswithcode.com/paper/dreambooth-fine-tuning-text-to-image

开源（2.3k Starred）：https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models (specializing them to users' needs). Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering (all while preserving the subject's key features). Project page: https://dreambooth.github.io/

大型文本到图像模型在人工智能的发展过程中实现了显著的飞跃，实现了从给定文本提示中高质量和多样化的图像合成。然而，这些模型缺乏模仿给定参考集中的主题外观的能力，并且无法在不同的上下文中合成主题的新颖再现。在这项工作中，我们提出了一种新的文本到图像扩散模型的“个性化”方法（专门针对用户的需求）。假设输入的只是一个主题的一些图像，我们对预处理的文本-图像模型（Imagen，尽管我们的方法不限于特定的模型）进行微调，以便它学会将唯一标识符与特定主题绑定。一旦对象嵌入到模型的输出域中，就可以使用唯一标识符合成不同场景中的对象的完全新颖的照片级真实感图像。通过利用模型中嵌入的语义先验和新的特定于类的先验保留损失，我们的技术能够在参考图像中没有出现的不同场景、姿势、视图和照明条件中合成主体。我们将我们的技术应用于几个以前无懈可击的任务，包括主题重新文本化、文本引导视图合成、外观修改和艺术渲染（所有这些都要保留主题的关键特征）。项目页面：https://dreambooth.github.io/