Introduction

The stochastic gradient method (Widrow, 1963; Amari, 1967; Tsypkin, 1973; Rumelhart, Hinton, & Williams, 1986) is a popular learning method in the general nonlinear optimization framework. The parameter space is not Euclidean but has a Riemannian metric structure in many cases. In these cases, the ordinary gradient does not give the steepest direction of a target function; rather, the steepest direction is given by the natural (or contravariant)gradient.

Natural Gradient

the squared length of a small incremental vector dw connecting w and w + dw is given by

when the coordinate system is nonorthogonal

Natural Gradient Learning

Statistical Estimation of Probability Density Function

Multilayer Neural Network

the model specifies the probability density of z as

where c is a normalizing constant and the loss function (see equation 3.6) is rewritten as

Natural Gradient Gives Fisher-Efficient Online Learning Algorithms

This section studies the accuracy of natural gradient learning from the statistical point of view. A statistical estimator that gives asymptotically the best result is said to be Fisher efficient. We prove that natural gradient learning attain Fisher efficiency.