shouldn't have a single MLP and output
at least one of the output is closed to 1 if "welcome" is in the recording just the sum of two derivative w_ij = w_mn = w^s; so that the step to any one of the weight affects the otherssum them up
all the convolution layers, (layers exclude the input layer and softmax layer) can be done in parallel
changing the order to parallelize the algorithm
so long the first layer has done its job, the second layer can start
input channel 3 -> first layer channel 4 -> second layer channel 2 -> output channel 1 (flatten)
layers first, then dimension distributing the patterns through layers, and have better representation first layer looks only at a small region now second layer behaves like the first layer one dot in 2nd layer looks at a much larger regionhigher layers learn implicitly the arrangement of the sub patterns
extend this process in three layers: first layer's region is even smaller
这好像不是tensor inner prod吧,只是elementwise multiplication???
eg:
W: 9*3*5*5
Y: 3*5*5
W*Y = 9*1
max pooling is also a filter
NN trained in one scale, can not be applied to a different scale -> the NN automatically learns the size of the object (say, flower) -> given flowers in different scales in dataset -> find it much harder to learn
网友评论