美文网首页
MAXIM:用于图像处理的多轴MLP

MAXIM:用于图像处理的多轴MLP

作者: Valar_Morghulis | 来源:发表于2022-06-23 07:52 被阅读0次

MAXIM: Multi-Axis MLP for Image Processing

9 January, 2022

CVPR 2022 Oral

(in 33 best paper finalist)

https://arxiv.org/abs/2201.02973

https://github.com/google-research/maxim

Authors: Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li

Abstract: Recent progress on Transformers and multi-layer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for low-level vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks. In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature conditioning. Both these modules are exclusively based on MLPs, but also benefit from being both global and `fully-convolutional', two properties that are desirable for image processing. Our extensive experimental results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks, including denoising, deblurring, deraining, dehazing, and enhancement while requiring fewer or comparable numbers of parameters and FLOPs than competitive models. The source code and trained models will be available at \url{https://github.com/google-research/maxim}. △ Less

摘要:Transformer和多层感知器(MLP)模型的最新进展为计算机视觉任务提供了新的网络体系结构设计。虽然这些模型在许多视觉任务(如图像识别)中被证明是有效的,但在适应低水平视觉方面仍然存在挑战。支持高分辨率图像的灵活性和局部注意力的限制可能是主要的瓶颈。在这项工作中,我们提出了一种称为MAXIM的基于多轴MLP的体系结构,它可以作为图像处理任务的高效灵活的通用视觉主干。MAXIM使用UNet形状的层次结构,支持通过空间选通MLP实现的远程交互。具体而言,MAXIM包含两个基于MLP的构建块:一个多轴门控MLP,允许有效且可扩展的局部和全局视觉线索的空间混合;另一个交叉门控块,作为交叉注意的替代品,用于解释交叉特征调节。这两个模块都是专门基于MLP的,但也得益于全局和“完全卷积”,这两个特性对于图像处理来说都是可取的。我们的大量实验结果表明,所提出的MAXIM模型在一系列图像处理任务中,包括去噪、去模糊、去模糊、去模糊和增强,在10多个基准上实现了最先进的性能,同时需要的参数和失败数比竞争模型少或可比。源代码和经过培训的模型将在url上提供{https://github.com/google-research/maxim}. 

相关文章

网友评论

      本文标题:MAXIM:用于图像处理的多轴MLP

      本文链接:https://www.haomeiwen.com/subject/krquvrtx.html