美文网首页
Natural Gradient Works Efficientl

Natural Gradient Works Efficientl

作者: 初七123 | 来源:发表于2019-01-12 12:42 被阅读14次

Introduction

The stochastic gradient method (Widrow, 1963; Amari, 1967; Tsypkin, 1973; Rumelhart, Hinton, & Williams, 1986) is a popular learning method in the general nonlinear optimization framework. The parameter space is not Euclidean but has a Riemannian metric structure in many cases. In these cases, the ordinary gradient does not give the steepest direction of a target function; rather, the steepest direction is given by the natural (or contravariant)gradient.

Natural Gradient

the squared length of a small incremental vector dw connecting w and w + dw is given by

when the coordinate system is nonorthogonal

Natural Gradient Learning

Statistical Estimation of Probability Density Function

Multilayer Neural Network

the model specifies the probability density of z as

where c is a normalizing constant and the loss function (see equation 3.6) is rewritten as

Natural Gradient Gives Fisher-Efficient Online Learning Algorithms

This section studies the accuracy of natural gradient learning from the statistical point of view. A statistical estimator that gives asymptotically the best result is said to be Fisher efficient. We prove that natural gradient learning attain Fisher efficiency.

The Cram ́er-Rao theorem states that the expected squared error of an unbiased estimator satisfies

where the inequality holds in the sense of positive definiteness of matrices

相关文章

网友评论

      本文标题:Natural Gradient Works Efficientl

      本文链接:https://www.haomeiwen.com/subject/rjexdqtx.html