






shouldn't have a single MLP and output














sum them up



all the convolution layers, (layers exclude the input layer and softmax layer) can be done in parallel




changing the order to parallelize the algorithm
so long the first layer has done its job, the second layer can start


input channel 3 -> first layer channel 4 -> second layer channel 2 -> output channel 1 (flatten)






higher layers learn implicitly the arrangement of the sub patterns
extend this process in three layers: first layer's region is even smaller






这好像不是tensor inner prod吧,只是elementwise multiplication???

eg:
W: 9*3*5*5
Y: 3*5*5
W*Y = 9*1










max pooling is also a filter







NN trained in one scale, can not be applied to a different scale -> the NN automatically learns the size of the object (say, flower) -> given flowers in different scales in dataset -> find it much harder to learn
网友评论