Background and notation
Neural Networks
The Natural Gradient
Fisher information matrix
we will instead compute Fusing the training distribution ˆQx over inputs x
The well-known natural gradient (Amari, 1998) is defined as
the natural gradient defines the direction in parameter space which gives the largest change in the objective per unit of change in the model, as measured by the KL-divergence
网友评论