美文网首页
Tensorflow Gradients is NAN

Tensorflow Gradients is NAN

作者: 风果 | 来源:发表于2018-01-05 08:57 被阅读250次

    (from stack overflow)

    https://stackoverflow.com/questions/41918795/minimize-a-function-of-one-variable-in-tensorflow

    Many of the other solutions use clipping to avoid an undefined

    gradient. Depending on your problem, clipping introduces bias and may

    not be acceptable in all cases. As the following code demonstrates, we

    need only handle the point of discontinuity--not the region near it.

    Specific Answer

    def cross_entropy(x, y, axis=-1):

      safe_y = tf.where(tf.equal(x, 0.), tf.ones_like(y), y)

      return -tf.reduce_sum(x * tf.log(safe_y), axis)

    def entropy(x, axis=-1):

      return cross_entropy(x, x, axis)

    But did it work?

    x = tf.constant([0.1, 0.2, 0., 0.7])

    e = entropy(x)

    # ==> 0.80181855

    g = tf.gradients(e, x)[0]

    # ==> array([1.30258512,  0.60943794, 0., -0.64332503], dtype=float32)  Yay! No NaN.

    (Note: deleteddup cross-post.)

    General Recipe

    Use an innertf.whereto ensure the function has no asymptote.That is, alter the input to the inf generating function such that no inf can be created.Then use a secondtf.whereto always select the valid code-path.That is, implement the mathematical condition as you would "normally", i.e., the "naive" implementation.

    In Python code, the recipe is:

    Instead of this:

    tf.where(x_ok, f(x), safe_f(x))

    Do this:

    safe_x = tf.where(x_ok, x, safe_x)

    tf.where(x_ok, f(safe_x), safe_f(x))

    Example

    Suppose you wish to compute:

    f(x) = { 1/x, x!=0

          { 0,  x=0

    A naive implementation results in NaNs in the gradient, i.e.,

    def f(x):

      x_ok = tf.not_equal(x, 0.)

      f = lambda x: 1. / x

      safe_f = tf.zeros_like

      return tf.where(x_ok, f(x), safe_f(x))

    Does it work?

    x = tf.constant([-1., 0, 1])

    tf.gradients(f(x), x)[0].eval()

    # ==> array([ -1.,  nan,  -1.], dtype=float32)

    #  ...bah! We have a NaN at the asymptote despite not having

    # an asymptote in the non-differentiated result.

    The basic pattern for avoiding NaN gradients when usingtf.whereis to calltf.wheretwice.  The innermosttf.whereensures that the resultf(x)is always finite. The outermosttf.whereensures the correct result is chosen.  For the running example, the trick plays out like this:

    def safe_f(x):

      x_ok = tf.not_equal(x, 0.)

      f = lambda x: 1. / x

      safe_f = tf.zeros_like

      safe_x = tf.where(x_ok, x, tf.ones_like(x))

      return tf.where(x_ok, f(safe_x), safe_f(x))

    But did it work?

    x = tf.constant([-1., 0, 1])

    tf.gradients(safe_f(x), x)[0].eval()

    # ==> array([-1.,  0., -1.], dtype=float32)

    # ...yay! double-where trick worked. Notice that the gradient

    # is now a constant at the asymptote (as opposed to being NaN).

    相关文章

      网友评论

          本文标题:Tensorflow Gradients is NAN

          本文链接:https://www.haomeiwen.com/subject/mkdjnxtx.html