美文网首页
2020-05-11pytorch自定义求导

2020-05-11pytorch自定义求导

作者: lzjngu | 来源:发表于2020-05-11 23:56 被阅读0次

    官方网站

    属性(成员变量)
    saved_tensors: 传给forward()的参数,在backward()中会用到。
    needs_input_grad:长度为 :attr:num_inputs的bool元组,表示输出是否需要梯度。
    可以用于优化反向过程的缓存。
    num_inputs: 传给函数 :func:forward的参数的数量。
    num_outputs: 函数 :func:forward返回的值的数目。
    requires_grad: 布尔值,表示函数 :func:backward 是否永远不会被调用。
    
    成员函数
    forward()
    forward()可以有任意多个输入、任意多个输出,但是输入和输出必须是tensor。
    backward()
    backward()的输入和输出的个数就是forward()函数的输出和输入的个数。其中,
    backward()输入表示关于forward()输出的梯度(计算图中上一节点的梯度),
    backward()的输出表示关于forward()的输入的梯度。在输入不需要梯度时(通过查看needs_input_grad参数)
    或者不可导时,可以返回None
    

    forward的说明

    1. 虽然说一个网络的输入是Variable形式,那么每个网络层的输出也是Variable形式。
    但是,当自定义autograd时,在forward中,所有的Variable参数将会转成tensor!
    因此在forward实际操作的对象是tensor。在传入forward前,
    autograd engine会自动将Variable unpack成Tensor。
    因此这里的input也是tensor.在forward中可以进行任意操作。
    2. ctx是context,ctx.save_for_backward会将他们转换为tensor(Variable)形式。
    也就是说, backward只对tensor(Variable)进行处理.
    3. save_for_backward只能传入Variable或是Tensor的变量,如果是其他类型的,可以用
    ctx.xyz = xyz,使其在backward中可以用。例如,ctx.constant = constant,
    这里constant为常数,不能直接作为ctx.save_for_backward的参数.
    

    backward说明

    自动求导是根据每个op的backward创建的graph来进行的!
    自动求导竟然是在backward的操作中创建计算图, 因此我们需要在backward中用全部用variable来操作,
    而forward就没必要,forward只需要用tensor操作就可以。
    
    y = x*w +b # 自己定义的LinearFunction
    z = f(y)
    下面的grad_output = dz/dy
    根据复合函数求导法则:
    1. dz/dx =  dz/dy * dy/dx = grad_output*dy/dx = grad_output*w
    2. dz/dw =  dz/dy * dy/dw = grad_output*dy/dw = grad_output*x
    3. dz/db = dz/dy * dy/db = grad_output*1
    
    import torch.autograd.Function as Function
    class LinearFunction(Function):
       # 创建torch.autograd.Function类的一个子类
        # 必须是staticmethod
        @staticmethod
        # 第一个是ctx,第二个是input,其他是可选参数。
        # ctx在这里类似self,ctx的属性可以在backward中调用。
        # 自己定义的Function中的forward()方法,所有的Variable参数将会转成tensor!
        # 因此这里的input也是tensor.在传入forward前,
        # autograd engine会自动将Variable unpack成Tensor。
        def forward(ctx, input, weight, bias=None):
            print(type(input))
            ctx.save_for_backward(input, weight, bias) # 将Tensor转变为Variable保存到ctx中
            output = input.mm(weight.t())  # torch.t()方法,对2D tensor进行转置
            if bias is not None:
                output += bias.unsqueeze(0).expand_as(output) #unsqueeze(0) 扩展处第0维
                # expand_as(tensor)等价于expand(tensor.size()), 将原tensor按照新的size进行扩展
            return output
    
        @staticmethod
        def backward(ctx, grad_output): 
            # grad_output为反向传播上一级计算得到的梯度值
            input, weight, bias = ctx.saved_tensors
            grad_input = grad_weight = grad_bias = None
            # 分别代表输入,权值,偏置三者的梯度
            # 判断三者对应的Variable是否需要进行反向求导计算梯度
            if ctx.needs_input_grad[0]:
                grad_input = grad_output.mm(weight) # 复合函数求导,链式法则
            if ctx.needs_input_grad[1]:
                grad_weight = grad_output.t().mm(input) # 复合函数求导,链式法则
            if bias is not None and ctx.needs_input_grad[2]:
                grad_bias = grad_output.sum(0).squeeze(0)
    
            return grad_input, grad_weight, grad_bias
    
    #建议把新操作封装在一个函数中
    def linear(input, weight, bias=None):
        # First braces create a Function object. Any arguments given here
        # will be passed to __init__. Second braces will invoke the __call__
        # operator, that will then use forward() to compute the result and
        # return it.
        return LinearFunction()(input, weight, bias)#调用forward()
    
    # 或者使用apply方法并取个别名
    linear = LinearFunction.apply
    
    #检查实现的backward()是否正确
    from torch.autograd import gradcheck
    # gradchek takes a tuple of tensor as input, check if your gradient
    # evaluated with these tensors are close enough to numerical
    # approximations and returns True if they all verify this condition.
    input = (Variable(torch.randn(20,20).double(), requires_grad=True),)
    test = gradcheck(LinearFunction(), input, eps=1e-6, atol=1e-4)
    print(test)  # 没问题的话输出True
    
    # 这里定义一个乘以常数的操作(输入参数是Tensor)
    class MulConstant(Function):
        @staticmethod
        def forward(ctx, tensor, constant):
            # ctx is a context object that can be used to stash information
            # for backward computation
            ctx.constant = constant
            return tensor * constant
    
        @staticmethod
        def backward(ctx, grad_output):
            # We return as many input gradients as there were arguments.
            # Gradients of non-Tensor arguments to forward must be None.
            # constant
            return grad_output * ctx.constant, None # 这里并没有涉及到Variable
    
    # 用自己定义的Function来创建Module
    import torch.nn as nn
    class Linear(nn.Module):
        def __init__(self, input_features, output_features, bias=True):
            super(Linear, self).__init__()
            self.input_features = input_features
            self.output_features = output_features
            # nn.Parameter is a special kind of Variable, that will get
            # automatically registered as Module's parameter once it's assigned
            # 这个很重要! Parameters是默认需要梯度的!
            self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
            if bias:
                self.bias = nn.Parameter(torch.Tensor(output_features))
            else:
                # You should always register all possible parameters, but the
                # optional ones can be None if you want.
                self.register_parameter('bias', None)
            # Not a very smart way to initialize weights
            self.weight.data.uniform_(-0.1, 0.1)
            if bias is not None:
                self.bias.data.uniform_(-0.1, 0.1)
        def forward(self, input):
            # See the autograd section for explanation of what happens here.
            return LinearFunction.apply(input, self.weight, self.bias)
            # 或者 return LinearFunction()(input, self.weight, self.bias)
    
    import torch
    from torch.autograd import Variable
    
    
    class MyReLU(torch.autograd.Function):
        """
        We can implement our own custom autograd Functions by subclassing
        torch.autograd.Function and implementing the forward and backward passes
        which operate on Tensors.
        """
    
        @staticmethod
        def forward(ctx, input):
            """
            In the forward pass we receive a Tensor containing the input and return
            a Tensor containing the output. ctx is a context object that can be used
            to stash information for backward computation. You can cache arbitrary
            objects for use in the backward pass using the ctx.save_for_backward method.
            """
            ctx.save_for_backward(input)
            return input.clamp(min=0)
    
        @staticmethod
        def backward(ctx, grad_output):
            """
            In the backward pass we receive a Tensor containing the gradient of the loss
            with respect to the output, and we need to compute the gradient of the loss
            with respect to the input.
            """
            input, = ctx.saved_tensors
            grad_input = grad_output.clone()
            grad_input[input < 0] = 0
            return grad_input
    
    
    dtype = torch.FloatTensor
    # dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU
    
    # N is batch size; D_in is input dimension;
    # H is hidden dimension; D_out is output dimension.
    N, D_in, H, D_out = 64, 1000, 100, 10
    
    # Create random Tensors to hold input and outputs, and wrap them in Variables.
    x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
    y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)
    
    # Create random Tensors for weights, and wrap them in Variables.
    w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
    w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)
    
    learning_rate = 1e-6
    for t in range(500):
        # To apply our Function, we use Function.apply method. We alias this as 'relu'.
        relu = MyReLU.apply
    
        # Forward pass: compute predicted y using operations on Variables; we compute
        # ReLU using our custom autograd operation.
        y_pred = relu(x.mm(w1)).mm(w2)
    
        # Compute and print loss
        loss = (y_pred - y).pow(2).sum()
        print(t, loss.data[0])
    
        # Use autograd to compute the backward pass.
        loss.backward()
    
        # Update weights using gradient descent
        w1.data -= learning_rate * w1.grad.data
        w2.data -= learning_rate * w2.grad.data
    
        # Manually zero the gradients after updating weights
        w1.grad.data.zero_()
        w2.grad.data.zero_()
    

    参考:
    定义torch.autograd.Function的子类,自己定义某些操作,且定义反向求导函数
    Pytorch入门学习(八)-----自定义层的实现

    相关文章

      网友评论

          本文标题:2020-05-11pytorch自定义求导

          本文链接:https://www.haomeiwen.com/subject/thumnhtx.html