MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
Feb 2023
Omer Bar-Tal*, Lior Yariv*, Yaron Lipman, Tali Dekel (* Equal contribution)
[Weizmann Institute of Science]
https://arxiv.org/abs/2302.08113
https://github.com/omerbt/MultiDiffusion >285 stars
https://multidiffusion.github.io/ ★★★★★
利用扩散模型生成文本到图像的最新进展显示了图像质量的变革能力。然而,所生成图像的用户可控性和对新任务的快速适应仍然是一个开放的挑战,目前主要通过对特定图像生成任务进行昂贵且长时间的重新训练和微调或特别适应来解决。在这项工作中,我们提出了MultiDiffusion,这是一个统一的框架,使用预先训练的文本到图像扩散模型,无需任何进一步训练或微调,即可实现多功能和可控的图像生成。我们方法的核心是基于优化任务的新生成过程,该优化任务将多个扩散生成过程与一组共享的参数或约束绑定在一起。我们表明,MultiDiffusion可以很容易地应用于生成符合用户提供的控制的高质量和多样化的图像,例如期望的纵横比(例如,全景)和空间引导信号,从紧密的分割掩模到边界框。项目网页:https://multidiffusion.github.io
Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes. Project webpage: https://multidiffusion.github.io
网友评论