Composer：具有可组合条件的创造性和可控图像合成

作者: Valar_Morghulis | 来源:发表于2023-02-26 09:24 被阅读0次

canvas(五) 使用图像
MAC AE快捷键
GLIDE: Towards Photorealistic Im
影像医生一一孙宪国
Android P 图形显示系统（一）硬件合成HWC2
音视频学习系列第（六）篇---音视频的分离与合成
Conditional Image Synthesis With
第二天图像合成与渐变工具
可控的外形条件
地球和火星的合成图像

Composer: Creative and Controllable Image Synthesis with Composable Conditions

Feb 2023

Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, Jingren Zhou

[Alibaba Group, Ant Group]

https://arxiv.org/abs/2302.09778

https://paperswithcode.com/paper/composer-creative-and-controllable-image

https://github.com/damo-vilab/composer

最近在大数据上学习的大规模生成模型能够合成令人难以置信的图像，但可控性有限。这项工作提供了一种新的生成范式，允许灵活控制输出图像，如空间布局和调色板，同时保持合成质量和模型创意。以合成性为核心思想，我们首先将图像分解为代表性因素，然后以所有这些因素为条件训练扩散模型，以重新组合输入。在推理阶段，丰富的中间表示作为可组合元素工作，为可定制的内容创建带来了巨大的设计空间（即，与分解因子的数量成指数比例）。值得注意的是，我们称之为Composer的方法支持各种级别的条件，例如作为全局信息的文本描述、作为局部指导的深度图和草图、用于低级细节的颜色直方图等，我们确认Composer作为一个通用框架，在不进行再训练的情况下促进了广泛的经典生成任务。将提供代码和型号。

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.