This prevents computing or even storing G(t) for moderately large-dimensional dynamical systems, such as recurrent neural networks.
1 The NoBackTrack algorithm
1.1 The rank-one trick: an expectation-preserving reduction
We propose to build an approximation of G(t),The construction of an unbiased is based on the following “rank-one trick”.
The rank-one reductionA˜ depends, not only on the value of A, but also on the way A is decomposed as a sum of rank-one terms. In the applications to recurrent networks below, there is a natural such choice.
网友评论