Introduction
The stochastic gradient method (Widrow, 1963; Amari, 1967; Tsypkin, 1973; Rumelhart, Hinton, & Williams, 1986) is a popular learning method in the general nonlinear optimization framework. The parameter space is not Euclidean but has a Riemannian metric structure in many cases. In these cases, the ordinary gradient does not give the steepest direction of a target function; rather, the steepest direction is given by the natural (or contravariant)gradient.
Natural Gradient
the squared length of a small incremental vector dw connecting w and w + dw is given by
when the coordinate system is nonorthogonal
Natural Gradient Learning
Statistical Estimation of Probability Density Function
Multilayer Neural Network
the model specifies the probability density of z as
where c is a normalizing constant and the loss function (see equation 3.6) is rewritten as
Natural Gradient Gives Fisher-Efficient Online Learning Algorithms
This section studies the accuracy of natural gradient learning from the statistical point of view. A statistical estimator that gives asymptotically the best result is said to be Fisher efficient. We prove that natural gradient learning attain Fisher efficiency.
The Cram ́er-Rao theorem states that the expected squared error of an unbiased estimator satisfies
where the inequality holds in the sense of positive definiteness of matrices
网友评论