compared to fully-connected neural networks, convolutional ones performs better in image recognition due to the different structures between adjacent layers
structure:
input: the primitive pixels of image which are 3-D matrice
output: the confidence of classification
- input layer: pixel matrice with RGB depth
- convolutional layer: multiple nodes with input of a block in past layer,which means to extracting deeper feature
- pooling: won't change the depth of last layer but will tighten the scale
- fully-connected: for classifying by feeding the given features
- softmax: for classifying via obtaining the probability of every class
convolutional layer(filter or kernel)
processed block is with the same scale as filter's
filter depth
output matrice scale :
out_length=(in_length-fil_length+1)/stride_length
out_width=(in_width-fil_width+1)/stride_width
filter_parameter_amounts=fil_widthfil_length_in_depthfil_depth
fil_weights=tf.get_variable('weigths',[fil_length,fil_width,in_depth,fil_depth],initializer=tf...)
biases=tf.get_variable('biases',[fil_depth],initializer=tf...)
#conv2d is used for forward-prob.
conv=tf.nn.conv2d(input,filter_weight,strides=[1,len_stride,wid_stride,1], padding='SAME') #'VAILD' means no zeros-
padding
#bias_add is used for adding biases,note do not directly add
bias=tf.nn.bias_add(conv,bias)
activation=tf.nn.relu(bias)
pooling layer
usage: make the scale shrink to enhance the computational speed and avoid over-fitting
max pooling
simply maximizing
#similarly to convolutional operation, you have to set the strides and padding,but stride in depth is valid for pooling #compared with convolutional layer. ksize is the scale of filter
pool=tf.nn.max_pool(activation,ksize=[1,fil_len,fil-wid,1], stride=[1,len_stride,wid_stride,1], padding ='SAME')
average pooling
classical models
LeNet-5
- first layer: convolutional layer
input: 32321
filter:556,padding='VALID'
stride: [1,1,1,1]
output:28286 - second layer: pooling
input : output of convolutional layer
filter:[1,2,2,1]
stride:[1,2,2,1]
output:14146 - third layer:convoluted layer
input: output of last layer
filter:5516,padding='VALID'
stride:[1,1,1,1]
output: 101016 - fourth layer: pooling layer
input: output of last layer
filter:22
stride:[1,2,2,1]
output: 55*16 - fifth layer : fully-connected(similar to convoluted layer)
input: output of last layer
filter:55
output:120
para.:5516120+120 - sixth layer: fully-connected
input: output of last layer
output: 84
para.:120*84+84 - seventh layer: fully-connected
input: output of last layer
output: 10
para.:84*10+10
xs=tf.palceholder(tf.float32,[Batch_size,mnist_inference.IMAGE_SIZE,mnist_inference.IMAGE_SIZE,mnist_inference.NUM_CHANNELS],name='x-input')
reshaped_xs=np.reshape(xs,(Batch_size,mnist_inference.IMAGE_SIZE,mnist_inference.IMAGE_SIZE,mnist_inference.NUM_CHANNELS))
def inference(tensor,train,regularizer)
with tf.variable_scope('layer1-conv1'):
conv1_weights=tf.get_variable('weights',[CONV1_SIZE,CONV1_SIZE,NUM_CHANNELS,CONV1_DEEP],initializer=...)
conv1_biases=tf...
conv1=tf.nn.conv2d(...)
relu1=tf.nn.relu(tf.nn.bias_add(...))
with tf.name_scope('layer2-pool1'):
pool1=tf.nn.max_pool(relu1,ksize=...,strides=...,padding=...)
with tf.variable_scope(layer3-conv2):
...
with tf.name_scope(...):
pool2=...
# reshape the data form to prepare for next fully-connected layer
pool_shape=pool2.get_shape().as_list()
nodes=pool_shape[1]*pool_shape[2]*pool_shape[3]
reshaped=tf.reshape(pool2,pool_shape[0],nodes])
with tf.variable_scope('layer5-fc1'):
fc1_weights=tf.get_variable('weights',[nodes,FC_SIZE],initializer=...)
if regularizer!=None:
tf.add_to_collection('losses',regularizer(fc1_weights))
fc1_biases=tf.get_variable('bias',[FC_SIZE],initializer=...)
fc1=tf.nn.relu(...)
if train:fc1=tf.nn.dropout(fc1,0.5)
with ...
...
logit=tf.matnul(fc1,fc2_weights)+fc2_biases
return logit
note:input->(convoluted+->pooling?)->fully-connected->softmax->out
Inception-v3
core method
in convoluted layer, three different kernels are provided to simultaneously process the input and then accumulate them together
for that,we set the stride to 1 and padding to 'SAME'
# predetermine the para. of some methods
with slim.arg_scope([slim.conv2d,slim.max_pool2d,slim.avg_pool2d],stride=1,padding='SAME'):
# inception module namespace
with tf.variable_scope('...'):
#for every path
with tf.variable_scope('...1'):
with tf.variable_scope('...2'):
with tf.variable_scope('...3'):
net=tf.concat(3,[...1,...2,...3])
网友评论