In sanity checks, the author discussed the case of one convolutional layer, where gradient will act as an edge detector. The gradient is
It is now clear why edges will be visible in the produced gradient, regions in the image corresponding to an “edge” will have a distinct activation pattern from surrounding pixels. In contrast, pixel regions of the image which are more uniform will all have the same activation pattern, and thus the same value of ∂ l(x).
But why in multi-layer CNN, the guided back-propagation is not sensitive to the random weights and random data label?
My thought is the learnt weights will have a certain distribution. while the random weights will have mean for zero.
Thus for gradient methods, the output is averaged by the weights with zero mean.
While for Guided BP, the output multiples the ReLU(W), thus the mean of weight is high, the edges can still be recognized.
def ReLU(matrix):
for i in range(len(matrix)):
if matrix[i]>0:
matrix[i] = matrix[i]
else:
matrix[i] = 0
return matrix
import numpy as np
a1 = np.random.normal(1000, 1, 10000)
a2 = np.random.normal(100, 1, 10000)
w = np.random.normal(0, 1, 10000)
# print(np.dot(a1,w))
# print(np.sum(w))
b1 = np.sum(ReLU(np.dot(a1,w)*w))
b2 = np.sum(ReLU(np.dot(a2,w)*w))
c1 = np.sum(np.dot(a1,w)*ReLU(w))
c2 = np.sum(np.dot(a2,w)*ReLU(w))
print("b1-b2: gradient:", (b1-b2)/10000)
print("c1-c2: GBP:", (c1-c2)/10000)
b1-b2: gradient: 653.812524326697
c1-c2: GBP: -158205.37690500243
从这个代码就可以看出,如果输入a1, a2的差距很明显(代表edge上的两个不同的点), 那么random weights的gradient比GBP小很多很多。
Screen Shot 2019-10-08 at 11.01.54 AM.png
the difference between backpropagation and gradient methods is the deal with w.
Screen Shot 2019-10-08 at 10.57.32 AM.pngbackpropagation mind the bottom right part
Screen Shot 2019-10-08 at 10.57.23 AM.png
Screen Shot 2019-10-08 at 10.57.41 AM.png
网友评论