DreamBooth
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
25 Aug 2022
作者:Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman
原文:https://arxiv.org/abs/2208.12242
https://paperswithcode.com/paper/dreambooth-fine-tuning-text-to-image
开源(2.3k Starred):https://github.com/XavierXiao/Dreambooth-Stable-Diffusion
Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models (specializing them to users' needs). Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model (Imagen, although our method is not limited to a specific model) such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can then be used to synthesize fully-novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views, and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, appearance modification, and artistic rendering (all while preserving the subject's key features). Project page: https://dreambooth.github.io/
大型文本到图像模型在人工智能的发展过程中实现了显著的飞跃,实现了从给定文本提示中高质量和多样化的图像合成。然而,这些模型缺乏模仿给定参考集中的主题外观的能力,并且无法在不同的上下文中合成主题的新颖再现。在这项工作中,我们提出了一种新的文本到图像扩散模型的“个性化”方法(专门针对用户的需求)。假设输入的只是一个主题的一些图像,我们对预处理的文本-图像模型(Imagen,尽管我们的方法不限于特定的模型)进行微调,以便它学会将唯一标识符与特定主题绑定。一旦对象嵌入到模型的输出域中,就可以使用唯一标识符合成不同场景中的对象的完全新颖的照片级真实感图像。通过利用模型中嵌入的语义先验和新的特定于类的先验保留损失,我们的技术能够在参考图像中没有出现的不同场景、姿势、视图和照明条件中合成主体。我们将我们的技术应用于几个以前无懈可击的任务,包括主题重新文本化、文本引导视图合成、外观修改和艺术渲染(所有这些都要保留主题的关键特征)。项目页面:https://dreambooth.github.io/
网友评论