美文网首页Pytorch
Pytorch采坑记~~持续更新中......

Pytorch采坑记~~持续更新中......

作者: 苗书宇 | 来源:发表于2018-12-14 16:42 被阅读4883次

    1、nn.Conv2D()输入参数数据格式不对

    报错:TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:
    完整报错

      File "G:/python/project/model/A2net.py", line 36, in <module>
        model = A2Block(64)
      File "G:/python/project/model/A2net.py", line 15, in __init__
        self.dimension_reduction = nn.Conv2d(in_channels=inplanes, out_channels=inplanes/2, kernel_size=1, stride=1)
      File "C:\Users\MSY\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 297, in __init__
        False, _pair(0), groups, bias)
      File "C:\Users\MSY\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 33, in __init__
        out_channels, in_channels // groups, *kernel_size))
    TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:
     * (torch.device device)
     * (torch.Storage storage)
     * (Tensor other)
     * (tuple of ints size, torch.device device)
     * (object data, torch.device device)
    

    问题定位:定位到报错行为:

    self.dimension_reduction = nn.Conv2d(in_channels=inplanes, out_channels=inplanes/2, kernel_size=1, stride=1)
    

    问题分析: 根据报错信息,是说本行代码包含有float的数据类型,通过分析可以看到,只有inplanes/2可能是float类型,由此想到在python3中n/2是带有小数点的,应该为n//2为整数。(由于一个粗心,报错一个如此尴尬的bug)
    问题解决:将输出通道数inplanes/2改为inplanes//2完美解决。


    2、make.sh 编译NMS遇到问题

    报错:OSError: The CUDA lib64 path could not be located in /usr/lib64
    完整报错

    Traceback (most recent call last):
      File "build.py", line 59, in <module>
        CUDA = locate_cuda()
      File "build.py", line 54, in locate_cuda
        raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
    OSError: The CUDA lib64 path could not be located in /usr/lib64
    

    问题定位:打开build.py(某些项目为setup.py)找到

    cudaconfig = {'home': home, 'nvcc': nvcc,
                      'include': pjoin(home, 'include'),
                      'lib64': pjoin(home, 'lib64')}
    

    问题分析:lib引用的问题
    问题解决:将home, 'lib64'中的lib64改为lib完美解决


    3、one of the variables needed for gradient computation has been modified by an inplace operation

    报错:one of the variables needed for gradient computation has been modified by an inplace operation
    完整报错

    Traceback (most recent call last):
      File "train_test.py", line 454, in <module>
        train()
      File "train_test.py", line 327, in train
        loss.backward()
      File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
        torch.autograd.backward(self, gradient, retain_graph, create_graph)
      File "/home/miao/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward
        allow_unreachable=True)  # allow_unreachable flag
    RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
    

    问题定位:这个bug报错并没有报明显的错误位置是最难过的,wwwwww~~~~~
    问题分析:此问题是在测试运行网上关于一版Pytorch版本的SSD代码时,出现的,根据网上的错误解释,应该时Pytorch0.4版本和0.3版本的某些不一致造成的。该问题的常用解决方法时:
    1:如果使用的是pytorch0.4.0版本,回退到pytorch0.3.0版本
    2:如果有inreplace参数,设为False
    3:报错原因是pytorch0.4.0版本后tensor不支持inplace操作了,所以把所有inplace错作去掉。
    后在博客modified by an inplace operation中似乎找到了合适的答案.简单来说:x += 1 这种改成 x = x+1 原因:x+=1的值会直接在原值上面做更新,是inplace=True的情况,而后一种是先让x+1然后赋值给x,属于inplace=False
    但是由于自己的代码较多,开始很难具体定位到哪个错误的位置,后来使用Beyond Compare(一款很棒的软件,强推~~~)与网上一版正确的代码比较,发现了错误。

    x /= norm  #(原本的错误代码)
    

    In-place的具体解释可以参考。pytorch 学习笔记(二十二):关于 inplace operation
    问题解决: 将x /= norm #改为x = x / norm


    相关文章

      网友评论

        本文标题:Pytorch采坑记~~持续更新中......

        本文链接:https://www.haomeiwen.com/subject/fymghqtx.html