Tensorflow Gradients is NAN

作者: 风果 | 来源:发表于2018-01-05 08:57 被阅读250次

(from stack overflow)

https://stackoverflow.com/questions/41918795/minimize-a-function-of-one-variable-in-tensorflow

Many of the other solutions use clipping to avoid an undefined

gradient. Depending on your problem, clipping introduces bias and may

not be acceptable in all cases. As the following code demonstrates, we

need only handle the point of discontinuity--not the region near it.

Specific Answer

def cross_entropy(x, y, axis=-1):

safe_y = tf.where(tf.equal(x, 0.), tf.ones_like(y), y)

return -tf.reduce_sum(x * tf.log(safe_y), axis)

def entropy(x, axis=-1):

return cross_entropy(x, x, axis)

But did it work?

x = tf.constant([0.1, 0.2, 0., 0.7])

e = entropy(x)

# ==> 0.80181855

g = tf.gradients(e, x)[0]

# ==> array([1.30258512, 0.60943794, 0., -0.64332503], dtype=float32) Yay! No NaN.

(Note: deleteddup cross-post.)

General Recipe

Use an innertf.whereto ensure the function has no asymptote.That is, alter the input to the inf generating function such that no inf can be created.Then use a secondtf.whereto always select the valid code-path.That is, implement the mathematical condition as you would "normally", i.e., the "naive" implementation.

In Python code, the recipe is:

Instead of this:

tf.where(x_ok, f(x), safe_f(x))

Do this:

safe_x = tf.where(x_ok, x, safe_x)

tf.where(x_ok, f(safe_x), safe_f(x))

Example

Suppose you wish to compute:

f(x) = { 1/x, x!=0

{ 0, x=0

A naive implementation results in NaNs in the gradient, i.e.,

def f(x):

x_ok = tf.not_equal(x, 0.)

f = lambda x: 1. / x

safe_f = tf.zeros_like

return tf.where(x_ok, f(x), safe_f(x))

Does it work?

x = tf.constant([-1., 0, 1])

tf.gradients(f(x), x)[0].eval()

# ==> array([ -1., nan, -1.], dtype=float32)

# ...bah! We have a NaN at the asymptote despite not having

# an asymptote in the non-differentiated result.

The basic pattern for avoiding NaN gradients when usingtf.whereis to calltf.wheretwice. The innermosttf.whereensures that the resultf(x)is always finite. The outermosttf.whereensures the correct result is chosen. For the running example, the trick plays out like this:

def safe_f(x):

x_ok = tf.not_equal(x, 0.)

f = lambda x: 1. / x

safe_f = tf.zeros_like

safe_x = tf.where(x_ok, x, tf.ones_like(x))

return tf.where(x_ok, f(safe_x), safe_f(x))

But did it work?

x = tf.constant([-1., 0, 1])

tf.gradients(safe_f(x), x)[0].eval()

# ==> array([-1., 0., -1.], dtype=float32)

# ...yay! double-where trick worked. Notice that the gradient

# is now a constant at the asymptote (as opposed to being NaN).

网友评论

本文标题：Tensorflow Gradients is NAN

本文链接：https://www.haomeiwen.com/subject/mkdjnxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！