Introduction
The stochastic gradient method (Widrow, 1963; Amari, 1967; Tsypkin, 1973; Rumelhart, Hinton, & Williams, 1986) is a popular learning method in the general nonlinear optimization framework. The parameter space is not Euclidean but has a Riemannian metric structure in many cases. In these cases, the ordinary gradient does not give the steepest direction of a target function; rather, the steepest direction is given by the natural (or contravariant)gradient.
Natural Gradient
the squared length of a small incremental vector dw connecting w and w + dw is given by

when the coordinate system is nonorthogonal





Natural Gradient Learning


Statistical Estimation of Probability Density Function




Multilayer Neural Network


the model specifies the probability density of z as

where c is a normalizing constant and the loss function (see equation 3.6) is rewritten as


Natural Gradient Gives Fisher-Efficient Online Learning Algorithms
This section studies the accuracy of natural gradient learning from the statistical point of view. A statistical estimator that gives asymptotically the best result is said to be Fisher efficient. We prove that natural gradient learning attain Fisher efficiency.


The Cram ́er-Rao theorem states that the expected squared error of an unbiased estimator satisfies

where the inequality holds in the sense of positive definiteness of matrices





网友评论