

作者: Valar_Morghulis | 来源:发表于2022-06-07 10:04 被阅读0次

该领域的一个重要学者:许志钦 Zhi-Qin John Xu


Overview frequency principle/spectral bias in deep learning


Authors: Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo        

18 January, 2022

Abstract: Understanding deep learning is increasingly emergent as it penetrates more and more into industry and science. In recent years, a research line from Fourier analysis sheds lights into this magical "black box" by showing a Frequency Principle (F-Principle or spectral bias) of the training behavior of deep neural networks (DNNs) -- DNNs often fit functions from low to high frequency during the training. The F-Principle is first demonstrated by one-dimensional synthetic data followed by the verification in high-dimensional real datasets. A series of works subsequently enhance the validity of the F-Principle. This low-frequency implicit bias reveals the strength of neural network in learning low-frequency functions as well as its deficiency in learning high-frequency functions. Such understanding inspires the design of DNN-based algorithms in practical problems, explains experimental phenomena emerging in various scenarios, and further advances the study of deep learning from the frequency perspective. Although incomplete, we provide an overview of F-Principle and propose some open problems for future research. 


DNN的训练过程示意图。训练数据从目标函数sin(x)+sin(5x)中采样。 红色、绿色和黑色曲线分别表示DNN输出、sin(x)和sin(x)+sin(5x)。 图片来源:Overview frequency principle/spectral bias in deep learning, https://arxiv.org/abs/2201.07395 图片来源:Overview frequency principle/spectral bias in deep learning, https://arxiv.org/abs/2201.07395

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks


20 September, 2019

Authors: Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma

Abstract: We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) --- DNNs often fit target functions from low to high frequencies --- on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibit faster convergence for higher frequencies for various scientific computing problems. With a simple theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.

文摘:我们从傅立叶分析的角度研究了深层神经网络(DNNs)的训练过程。我们在高维基准数据集(如MNIST/CIFAR10)和深度神经网络(如VGG16)上演示了一个非常通用的频率原理(F原理)——DNN通常从低频到高频拟合目标函数。DNNs的F原理与大多数传统迭代数值格式(如Jacobi方法)的行为相反,对于各种科学计算问题,该方法在更高频率下具有更快的收敛速度。通过一个简单的理论,我们说明了这个F原理是由常用激活函数的正则性所导致的。F原则暗示了一种隐式偏差,即DNN倾向于通过低频函数拟合训练数据。这种理解解释了DNN在大多数真实数据集上的良好泛化,以及DNN在奇偶函数或随机数据集上的不良泛化。△ 较少的

On the Spectral Bias of Neural Networks


22 Jun 2018

ICML 2019

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with 100% accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuations without affecting their global behavior. Intuitively, this property is in line with the observation that over-parameterized networks find simple patterns that generalize across data samples. We also investigate how the shape of the data manifold affects expressivity by showing evidence that learning high frequencies gets \emph{easier} with increasing manifold complexity, and present a theoretical understanding of this behavior. Finally, we study the robustness of the frequency components with respect to parameter perturbation, to develop the intuition that the parameters must be finely tuned to express high frequency functions.


Understanding training and generalization in deep learning by Fourier analysis


13 Aug 2018

Zhiqin John Xu

Background: It is still an open research area to theoretically understand why Deep Neural Networks (DNNs)---equipped with many more parameters than training data and trained by (stochastic) gradient-based methods---often achieve remarkably low generalization error. Contribution: We study DNN training by Fourier analysis. Our theoretical framework explains: i) DNN with (stochastic) gradient-based methods often endows low-frequency components of the target function with a higher priority during the training; ii) Small initialization leads to good generalization ability of DNN while preserving the DNN's ability to fit any function. These results are further confirmed by experiments of DNNs fitting the following datasets, that is, natural images, one-dimensional functions and MNIST dataset.


Training behavior of deep neural network in frequency domain


3 July, 2018

Authors: Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao

Abstract: Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding]. To find a potential mechanism, we focus on the study of implicit biases underlying the training process of DNNs. In this work, for both real and synthetic datasets, we empirically find that a DNN with common settings first quickly captures the dominant low-frequency components, and then relatively slowly captures the high-frequency ones. We call this phenomenon Frequency Principle (F-Principle). The F-Principle can be observed over DNNs of various structures, activation functions, and training algorithms in our experiments. We also illustrate how the F-Principle help understand the effect of early-stopping as well as the generalization of DNNs. This F-Principle potentially provides insights into a general principle underlying DNN optimization and generalization. 

摘要:为什么能够过度拟合的深层神经网络(DNN)在实践中往往能很好地推广,这是一个谜[#zhang2016understanding]。为了找到一种潜在的机制,我们专注于研究DNN训练过程中潜在的内隐偏见。在这项工作中,对于真实数据集和合成数据集,我们根据经验发现,具有常见设置的DNN首先快速捕获主要低频成分,然后相对缓慢地捕获高频成分。我们称这种现象为频率原理(F原理)。在我们的实验中,可以在各种结构、激活函数和训练算法的DNN上观察到F原理。我们还说明了F-原理如何帮助理解提前停止的效果以及DNN的推广。这一F原则有可能为DNN优化和泛化的一般原则提供见解。△ 较少的


https://zhuanlan.zhihu.com/p/380553780    (★★★★★





