美文网首页
频率原则:DNN通常从低频到高频拟合目标函数

频率原则:DNN通常从低频到高频拟合目标函数

作者: Valar_Morghulis | 来源:发表于2022-06-07 10:04 被阅读0次

该领域的一个重要学者:许志钦 Zhi-Qin John Xu

他的知乎地址:https://www.zhihu.com/people/man-98

Overview frequency principle/spectral bias in deep learning

https://arxiv.org/abs/2201.07395

Authors: Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo        

18 January, 2022

Abstract: Understanding deep learning is increasingly emergent as it penetrates more and more into industry and science. In recent years, a research line from Fourier analysis sheds lights into this magical "black box" by showing a Frequency Principle (F-Principle or spectral bias) of the training behavior of deep neural networks (DNNs) -- DNNs often fit functions from low to high frequency during the training. The F-Principle is first demonstrated by one-dimensional synthetic data followed by the verification in high-dimensional real datasets. A series of works subsequently enhance the validity of the F-Principle. This low-frequency implicit bias reveals the strength of neural network in learning low-frequency functions as well as its deficiency in learning high-frequency functions. Such understanding inspires the design of DNN-based algorithms in practical problems, explains experimental phenomena emerging in various scenarios, and further advances the study of deep learning from the frequency perspective. Although incomplete, we provide an overview of F-Principle and propose some open problems for future research. 

摘要:随着深度学习越来越多地渗透到工业和科学领域,对其的理解日益迫切。近年来,傅立叶分析的一条研究路线通过显示深层神经网络(DNN)训练行为的频率原理(F原理或光谱偏差)揭示了这个神奇的“黑匣子”——DNN通常在训练过程中从低频到高频拟合函数。首先通过一维合成数据证明F原理,然后在高维真实数据集中进行验证。随后的一系列工作增强了F原则的有效性。这种低频内隐偏差揭示了神经网络在学习低频函数方面的优势以及在学习高频函数方面的不足。这种理解启发了在实际问题中设计基于DNN的算法,解释了各种场景中出现的实验现象,并从频率角度进一步推进了深度学习的研究。虽然不完整,但我们对F-原理进行了概述,并提出了一些有待进一步研究的问题。

DNN的训练过程示意图。训练数据从目标函数sin(x)+sin(5x)中采样。 红色、绿色和黑色曲线分别表示DNN输出、sin(x)和sin(x)+sin(5x)。 图片来源:Overview frequency principle/spectral bias in deep learning, https://arxiv.org/abs/2201.07395 图片来源:Overview frequency principle/spectral bias in deep learning, https://arxiv.org/abs/2201.07395

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks

https://arxiv.org/abs/1901.06523

20 September, 2019

Authors: Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma

Abstract: We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) --- DNNs often fit target functions from low to high frequencies --- on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibit faster convergence for higher frequencies for various scientific computing problems. With a simple theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.

文摘:我们从傅立叶分析的角度研究了深层神经网络(DNNs)的训练过程。我们在高维基准数据集(如MNIST/CIFAR10)和深度神经网络(如VGG16)上演示了一个非常通用的频率原理(F原理)——DNN通常从低频到高频拟合目标函数。DNNs的F原理与大多数传统迭代数值格式(如Jacobi方法)的行为相反,对于各种科学计算问题,该方法在更高频率下具有更快的收敛速度。通过一个简单的理论,我们说明了这个F原理是由常用激活函数的正则性所导致的。F原则暗示了一种隐式偏差,即DNN倾向于通过低频函数拟合训练数据。这种理解解释了DNN在大多数真实数据集上的良好泛化,以及DNN在奇偶函数或随机数据集上的不良泛化。△ 较少的

On the Spectral Bias of Neural Networks

https://arxiv.org/abs/1806.08734

22 Jun 2018

ICML 2019

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with 100% accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuations without affecting their global behavior. Intuitively, this property is in line with the observation that over-parameterized networks find simple patterns that generalize across data samples. We also investigate how the shape of the data manifold affects expressivity by showing evidence that learning high frequencies gets \emph{easier} with increasing manifold complexity, and present a theoretical understanding of this behavior. Finally, we study the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.

神经网络是一类具有高度表达能力的函数,能够100%准确地拟合随机输入输出映射。在这项工作中,我们展示了神经网络的特性,以补充表达能力的这一方面。通过使用傅立叶分析的工具,我们证明了深ReLU网络偏向于低频函数,这意味着它们不能有局部波动而不影响其全局行为。直观地说,这一特性与过度参数化网络发现跨数据样本泛化的简单模式的观察结果一致。我们还研究了数据流形的形状如何影响表达能力,通过证明随着流形复杂性的增加,学习高频变得更容易,并从理论上理解了这种行为。最后,我们研究了频率分量对参数摄动的鲁棒性,以发展一种直觉,即必须对参数进行微调以表示高频函数。

Understanding training and generalization in deep learning by Fourier analysis

https://arxiv.org/abs/1808.04295

13 Aug 2018

Zhiqin John Xu

Background: It is still an open research area to theoretically understand why Deep Neural Networks (DNNs)---equipped with many more parameters than training data and trained by (stochastic) gradient-based methods---often achieve remarkably low generalization error. Contribution: We study DNN training by Fourier analysis. Our theoretical framework explains: i) DNN with (stochastic) gradient-based methods often endows low-frequency components of the target function with a higher priority during the training; ii) Small initialization leads to good generalization ability of DNN while preserving the DNN's ability to fit any function. These results are further confirmed by experiments of DNNs fitting the following datasets, that is, natural images, one-dimensional functions and MNIST dataset.

背景:从理论上理解为什么深层神经网络(DNN)——配备了比训练数据更多的参数,并通过(随机)基于梯度的方法进行训练——通常会获得非常低的泛化误差,这仍然是一个开放的研究领域。贡献:我们通过傅立叶分析研究DNN训练。我们的理论框架说明:i)基于梯度(随机)方法的DNN通常在训练过程中赋予目标函数的低频分量更高的优先级;ii)较小的初始化导致DNN具有良好的泛化能力,同时保持DNN适应任何函数的能力。DNNs对自然图像、一维函数和MNIST数据集的拟合实验进一步证实了这些结果。

Training behavior of deep neural network in frequency domain

https://arxiv.org/abs/1807.01251

3 July, 2018

Authors: Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao

Abstract: Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle help understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insights into a general principle underlying DNN optimization and generalization. 

摘要:为什么能够过度拟合的深层神经网络(DNN)在实践中往往能很好地推广,这是一个谜[#zhang2016understanding]。为了找到一种潜在的机制,我们专注于研究DNN训练过程中潜在的内隐偏见。在这项工作中,对于真实数据集和合成数据集,我们根据经验发现,具有常见设置的DNN首先快速捕获主要低频成分,然后相对缓慢地捕获高频成分。我们称这种现象为频率原理(F原理)。在我们的实验中,可以在各种结构、激活函数和训练算法的DNN上观察到F原理。我们还说明了F-原理如何帮助理解提前停止的效果以及DNN的推广。这一F原则有可能为DNN优化和泛化的一般原则提供见解。△ 较少的

reference:

https://zhuanlan.zhihu.com/p/380553780    (★★★★★

https://zhuanlan.zhihu.com/p/160806229

https://zhuanlan.zhihu.com/p/56077603

相关文章

网友评论

      本文标题:频率原则:DNN通常从低频到高频拟合目标函数

      本文链接:https://www.haomeiwen.com/subject/myasmrtx.html