TensorFlow 学前班

作者: 小聪明李良才 | 来源:发表于2017-02-13 15:21 被阅读1159次

    本文我参加Udacity的深度学习基石课程的学习的第3周总结,主题是在学习 TensorFlow 之前,先自己做一个miniflow,通过本周的学习,对于TensorFlow有了个简单的认识,github上的项目是:https://github.com/zhuanxuhit/nd101 ,欢迎关注的。

    我们知道创建一个神经网络的一般步骤是:

    1. normalization
    2. learning hyperparameters
    3. initializing weights
    4. forward propagation
    5. caculate error
    6. backpropagation

    而上面步骤在TensorFlow中实现的时候,一般我们的步骤是:

    1. Define the graph of nodes and edges.
    2. Propagate(传播) values through the graph.

    接着在我们实现miniflow的时候,我们会先来定义node和graph,然后再来实现 forward propagation 和 backpropagation

    1. node

    我们先来看node的概念,看个简单的神经网络:


    上面的神经网络就是一个大的网络,每个node都有输入和输出,每个node根据输入都会计算出输出,因此我们先来定义node:

    class Node(object):
        def __init__(self, inbound_nodes=[]):
            self.inbound_nodes = inbound_nodes
            self.outbound_nodes = []
            for n in self.inbound_nodes:
                n.outbound_nodes.append(self)
            self.value = None    
    

    有了最简单的node,下一步就是来实现 forward propagation。

    Forward propagation

    为了计算一个node,需要知道它的输入,而输入又依赖于其他节点的输出,这种为了计算当前节点而求其所有前置节点的技术叫拓扑排序topological sort
    用图来表示就如下图:


    上面为了计算最后的Node F,我们给出了一个可行的计算顺序,我们此处直接给出一个算法:Kahn's Algorithm,代码如下:
    def topological_sort(feed_dict):
        input_nodes = [n for n in feed_dict.keys()]
    
        G = {}
        nodes = [n for n in input_nodes]
        while len(nodes) > 0:
            n = nodes.pop(0)
            if n not in G:
                G[n] = {'in': set(), 'out': set()}
            for m in n.outbound_nodes:
                if m not in G:
                    G[m] = {'in': set(), 'out': set()}
                G[n]['out'].add(m)
                G[m]['in'].add(n)
                nodes.append(m)
    
        L = []
        S = set(input_nodes)
        while len(S) > 0:
            n = S.pop()
    
            if isinstance(n, Input):
                n.value = feed_dict[n]
    
            L.append(n)
            for m in n.outbound_nodes:
                G[n]['out'].remove(m)
                G[m]['in'].remove(n)
                # if no other incoming edges add to S
                if len(G[m]['in']) == 0:
                    S.add(m)
        return L
    
    def forward_pass(output_node, sorted_nodes):
        for n in sorted_nodes:
            n.forward()
    
        return output_node.value
    

    下面我们来实现一些简单的Node类型,第一个是Input类型:

    class Input(Node):
        def __init__(self):
            Node.__init__(self)
    
        def forward(self, value=None):
            if value is not None:
                self.value = value
    

    下面是Mul类型:

    class Mul(Node):
        def __init__(self, *inputs):
            Node.__init__(self, inputs)
    
        def forward(self):
            sum = 1.0
            for n in self.inbound_nodes:
                sum *= n.value
            self.value = sum   
    

    具体的用法如下:

    x, y, z = Input(), Input(), Input()
    
    f = Mul(x, y, z)
    
    feed_dict = {x: 4, y: 5, z: 10}
    
    graph = topological_sort(feed_dict)
    output = forward_pass(f, graph)
    
    # should output 19
    print("{} * {} * {} = {} (according to miniflow)".format(feed_dict[x], feed_dict[y], feed_dict[z], output))
    
    4 * 5 * 10 = 200.0 (according to miniflow)
    

    下面我们来实现下稍微复杂点的Node类型:Linear Node

    class Linear(Node):
        def __init__(self, inputs, weights, bias):
            Node.__init__(self, [inputs, weights, bias])
    
        def forward(self):
            inputs = self.inbound_nodes[0].value
            weights = self.inbound_nodes[1].value
            bias = self.inbound_nodes[2].value
    
            
            sum = 0
            for i in range(len(inputs)):
                sum += inputs[i] * weights[i]
                
            self.value =  sum + bias   
    

    有了LinearNode,我们就可以进行下面的计算了:

    inputs, weights, bias = Input(), Input(), Input()
    
    f = Linear(inputs, weights, bias)
    
    feed_dict = {
        inputs: [6, 20, 4],
        weights: [0.5, 0.25, 1.5],
        bias: 2
    }
    
    graph = topological_sort(feed_dict)
    output = forward_pass(f, graph)
    
    print(output)
    
    16.0
    

    有了LinearNode,我们还可以再定义sigmoidNode。

    class Sigmoid(Node):
        def __init__(self, node):
            Node.__init__(self, [node])
    
        def _sigmoid(self, x):
            return 1. / (1. + np.exp(-x))
    
        def forward(self):
            input_value = self.inbound_nodes[0].value
            self.value = self._sigmoid(input_value)
    

    定义完node,我们下一步就是来看怎么定义输出好坏的标准了。

    2. 定义cost函数

    我们在训练神经网络的时候,需要有个目标,就是尽可能的让输出准确,怎么衡量呢?我们可以通过均方误差 (MSE)来衡量,这也可以用一个MSENode来建模

    class MSE(Node):
        def __init__(self, y, a):
            Node.__init__(self, [y, a])
    
        def forward(self):
            y = self.inbound_nodes[0].value.reshape(-1, 1)
            a = self.inbound_nodes[1].value.reshape(-1, 1)
            # TODO: your code here
            m = len(y)
            sum = 0.
            for (yi,ai) in zip(y,a):
                sum += np.square(yi-ai)
            self.value = sum / m
    

    3. 定义反向传播

    现在我们有了衡量输出好坏的函数,我们需要的是怎么能快速的让输出尽可能的好,这就要引出Gradient Descent,梯度即slope斜率,我们通过它来定义我们优化的方向,更详细的可以看文章停下来思考下神经网络
    有了梯度的概念后,我们来看一个神经网络图:


    上面我们为了计算MESE对于w1的梯度,我们沿着图中的红色线走,给出了梯度的计算方式,这种计算方式就是微积分中的链式法则,能让我们计算任意一个变量的梯度,下面我们给出梯度的计算代码,相比较之前的Node中,多了一个backward函数,看下面的实现:
    import numpy as np
    
    
    class Node(object):
        def __init__(self, inbound_nodes=[]):
            self.inbound_nodes = inbound_nodes
            self.value = None
            self.outbound_nodes = []
            self.gradients = {}
            for node in inbound_nodes:
                node.outbound_nodes.append(self)
    
        def forward(self):
            raise NotImplementedError
    
        def backward(self):
            raise NotImplementedError
    
    
    class Input(Node):
        def __init__(self):
            Node.__init__(self)
    
        def forward(self):        
            pass
    
        def backward(self):
            self.gradients = {self: 0}
            # 输入节点的梯度等于所有输出的梯度相加
            for n in self.outbound_nodes:
                grad_cost = n.gradients[self]
                self.gradients[self] += grad_cost * 1
    
    
    class Linear(Node):
        def __init__(self, X, W, b):       
            Node.__init__(self, [X, W, b])
    
        def forward(self):     
            X = self.inbound_nodes[0].value
            W = self.inbound_nodes[1].value
            b = self.inbound_nodes[2].value
            
            X = self.inbound_nodes[0].value
            W = self.inbound_nodes[1].value
            b = self.inbound_nodes[2].value
            self.value = np.dot(X, W) + b  
    
        def backward(self):
            self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
            for n in self.outbound_nodes:
                
                grad_cost = n.gradients[self]
                # y = XW + b
                # 分别计算y相对于每个输入节点的梯度
                # delta_x = w
                self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
                # delta_w = x
                self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
                # delta_b = 1
                self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)
    
    
    class Sigmoid(Node):
    
        def __init__(self, node):
            # The base class constructor.
            Node.__init__(self, [node])
    
        def _sigmoid(self, x):
            return 1. / (1. + np.exp(-x))
    
        def forward(self):
            input_value = self.inbound_nodes[0].value
            self.value = self._sigmoid(input_value)
    
        def backward(self):
            # Initialize the gradients to 0.
            self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
    
         
            for n in self.outbound_nodes:
                # Get the partial of the cost with respect to this node.
                grad_cost = n.gradients[self]
              
                sigmoid = self.value
                self.gradients[self.inbound_nodes[0]] = sigmoid * (1-sigmoid) * grad_cost
    
    
    class MSE(Node):
        def __init__(self, y, a):
           
            # Call the base class' constructor.
            Node.__init__(self, [y, a])
    
        def forward(self):
            
            y = self.inbound_nodes[0].value.reshape(-1, 1)
            a = self.inbound_nodes[1].value.reshape(-1, 1)
    
            self.m = self.inbound_nodes[0].value.shape[0]
           
            self.diff = y - a
            self.value = np.mean(self.diff**2)
    
        def backward(self):
        
            self.gradients[self.inbound_nodes[0]] = (2 / self.m) * self.diff
            self.gradients[self.inbound_nodes[1]] = (-2 / self.m) * self.diff
    
    
    def topological_sort(feed_dict):
    
        input_nodes = [n for n in feed_dict.keys()]
    
        G = {}
        nodes = [n for n in input_nodes]
        while len(nodes) > 0:
            n = nodes.pop(0)
            if n not in G:
                G[n] = {'in': set(), 'out': set()}
            for m in n.outbound_nodes:
                if m not in G:
                    G[m] = {'in': set(), 'out': set()}
                G[n]['out'].add(m)
                G[m]['in'].add(n)
                nodes.append(m)
    
        L = []
        S = set(input_nodes)
        while len(S) > 0:
            n = S.pop()
    
            if isinstance(n, Input):
                n.value = feed_dict[n]
    
            L.append(n)
            for m in n.outbound_nodes:
                G[n]['out'].remove(m)
                G[m]['in'].remove(n)
                # if no other incoming edges add to S
                if len(G[m]['in']) == 0:
                    S.add(m)
        return L
    
    
    def forward_and_backward(graph):
        # Forward pass
        for n in graph:
            n.forward()
    
        # Backward pass
        # see: https://docs.python.org/2.3/whatsnew/section-slices.html
        for n in graph[::-1]:
            n.backward()
    
    

    上面定义了所有需要的节点和函数,根据上面我们就可以得出下面的方法了:

    X, W, b = Input(), Input(), Input()
    y = Input()
    f = Linear(X, W, b)
    a = Sigmoid(f)
    cost = MSE(y, a)
    
    X_ = np.array([[-1., -2.], [-1, -2]])
    W_ = np.array([[2.], [3.]])
    b_ = np.array([-3.])
    y_ = np.array([1, 2])
    
    feed_dict = {
        X: X_,
        y: y_,
        W: W_,
        b: b_,
    }
    
    graph = topological_sort(feed_dict)
    forward_and_backward(graph)
    # return the gradients for each Input
    gradients = [t.gradients[t] for t in [X, y, W, b]]
    
    print(gradients)
    
    [array([[ -3.34017280e-05,  -5.01025919e-05],
           [ -6.68040138e-05,  -1.00206021e-04]]), array([[ 0.9999833],
           [ 1.9999833]]), array([[  5.01028709e-05],
           [  1.00205742e-04]]), array([ -5.01028709e-05])]
    
    ## 4. 随机梯度下降(Stochastic Gradient Descent)
    以前一直没明白SGD是什么,最近才知道。
    我们来看如果我们每次对全量数据都计算gradient后再去更新参数,我们可能会出现内存不够的情况,
    因此我们的一个策略是:从全量中选出一部分数据,计算这些数据后就更新参数
    因此我们就有了下面的代码:
    
    def sgd_update(trainables, learning_rate=1e-2):
        for n in trainables:
            n.value -= learning_rate * n.gradients[n]
            
    from sklearn.datasets import load_boston
    from sklearn.utils import shuffle, resample
    
    # Load data
    data = load_boston()
    X_ = data['data']
    y_ = data['target']
    
    # Normalize data
    X_ = (X_ - np.mean(X_, axis=0)) / np.std(X_, axis=0)
    
    n_features = X_.shape[1]
    n_hidden = 10
    W1_ = np.random.randn(n_features, n_hidden)
    b1_ = np.zeros(n_hidden)
    W2_ = np.random.randn(n_hidden, 1)
    b2_ = np.zeros(1)
    
    # Neural network
    X, y = Input(), Input()
    W1, b1 = Input(), Input()
    W2, b2 = Input(), Input()
    
    l1 = Linear(X, W1, b1)
    s1 = Sigmoid(l1)
    l2 = Linear(s1, W2, b2)
    cost = MSE(y, l2)
    
    feed_dict = {
        X: X_,
        y: y_,
        W1: W1_,
        b1: b1_,
        W2: W2_,
        b2: b2_
    }
    
    epochs = 10
    # Total number of examples
    m = X_.shape[0]
    batch_size = 11
    steps_per_epoch = m // batch_size
    
    graph = topological_sort(feed_dict)
    trainables = [W1, b1, W2, b2]
    
    print("Total number of examples = {}".format(m))
    
    # Step 4
    for i in range(epochs):
        loss = 0
        for j in range(steps_per_epoch):
            # Step 1
            # Randomly sample a batch of examples
            X_batch, y_batch = resample(X_, y_, n_samples=batch_size)
    
            # Reset value of X and y Inputs
            X.value = X_batch
            y.value = y_batch
    
            # Step 2
            forward_and_backward(graph)
    
            # Step 3
            sgd_update(trainables)
    
            loss += graph[-1].value
    
        print("Epoch: {}, Loss: {:.3f}".format(i+1, loss/steps_per_epoch))
            
    
    Total number of examples = 506
    Epoch: 1, Loss: 133.910
    Epoch: 2, Loss: 36.332
    Epoch: 3, Loss: 22.353
    Epoch: 4, Loss: 26.704
    Epoch: 5, Loss: 23.121
    Epoch: 6, Loss: 23.491
    Epoch: 7, Loss: 21.393
    Epoch: 8, Loss: 15.300
    Epoch: 9, Loss: 13.391
    Epoch: 10, Loss: 15.651
    

    总结

    以上就是我们miniflow的全部了,我们先是定义Node,然后定义Node之间的关系得到图,再通过forward propagation计算输出,通过MES来衡量输出好坏,通过链式法则计算梯度来更新参数让cost不断缩小,最后通过SGD来加快计算。

    
    

    相关文章

      网友评论

        本文标题:TensorFlow 学前班

        本文链接:https://www.haomeiwen.com/subject/qeftwttx.html