preface:
deep learning :a collection of highly complicated data modeling algorithms achieved via multiple layers of nonlinear translation
in a sense ,deep learning equals to DNN
linear models are of great limitation
multi-layers equals to single layer
activation achieves non-linearization
tensorflow provides 7 activations
such as:relu,sigmoid,tanh and etc
multilayers to solve exclusive OR problems
typical point:conpound features extraction
loss function
- cross entropy
-tf.reduce_mean(y_*tf.log(y))
how to turn the results of forward-probagation to the form of probability distribution?
then: Softmax is introduced
softmax.png -
MSE(mean s)
MSE.png - custom algorithms according to practice
optimizing algorithms
- gradient descent
- back-probagation
- stochastic gradient descent
- trade-off batch gradient descent
about learning rate
- setup
exponential decay
#tf.train.exponential_decay
#realization
decayed_learning _rate=learning_rate*decay_rate^(global_step/decay_step)
#----------use
learning_rate=tf.train.exponential_decay(0.1,global_step,100,0.96,staircase=True)
#staircase is true ,so multiply it with 0.96 every 100 steps,namely the function is stair-shaped
over-fitting
definition:memorize the random noise instead of learning the total trend
way to avoid: regularization
a metric depicting the complexity about coefficient
- L1 one norm ,which will sparse the parameters(more zeros)
- L2 two norm,which usually is differentiable so normal
- exponentialMovingAverage
#it is a object
# constructing para.
def __init__(self,decay,num_updates=None,zero_debias=False)
# args:decay for calculate the value of shadow variable i.e. object.average(variable),num_updates for updating decay
# usually the global_step
# global_step is the assistant variable and will add by 1 every training apoch
# member function apply() which is called to create a shadow variable with updated value
# object.apply(self,var_list=None)
# algorithm: decay=min{DECAY,(1.0+num_updates)/(10.0+num_updates)},DECAY is fixed
# object.average() for getting the value : shadow_variable = decay * shadow_variable + (1-decay) * variable
# control the distance between now and before and slower the change
# this method won’t change the para. but will influence the gradient descent via adjusting the result of forward-prob.
网友评论