Optimizing Neural Networks with

Optimizing Neural Networks with

作者: 初七123 | 来源:发表于2019-01-09 14:44 被阅读3次

Background and notation

Neural Networks

The Natural Gradient

Fisher information matrix

we will instead compute Fusing the training distribution ˆQx over inputs x

The well-known natural gradient (Amari, 1998) is defined as

the natural gradient defines the direction in parameter space which gives the largest change in the objective per unit of change in the model, as measured by the KL-divergence

A block-wise Kronecker-factored Fisher approximation

Additional approximations to ̃F and inverse computations

Update damping

Pseudocode for K-FAC



      本文标题:Optimizing Neural Networks with
