美文网首页python
Theano坑--无法调用GPU

Theano坑--无法调用GPU

作者: 兜里有颗糖儿 | 来源:发表于2018-07-11 01:35 被阅读0次

    问题解决:

    1> theano-gpu调用 <---主要问题
    2> nvcc 路径:nvcc compiler not found on $PATH
    3> nvcc 权限不够:.sh: /usr/local/cuda-8.0/bin/../nvvm/bin/cicc: 权限不够
    4> pygpu安装:ImportError: No module named pygpu
    5> theano安装
    6> cudnn的版本不匹配:RuntimeError: Mixed dnn version. The header is version 5110 while the library is version 6021.
    7> cudnn6.0.10的安装

    前提:

    1.在服务器上已经安装好了cuda、cudnn、nvidia驱动等。
    2.调用gpu加速配置文件:
    方法一:

    每次运行时,使用THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32 python XXX.py

    方法二:

    生成 .theanorc 文件
    “.theanorc. ” 文件包含theano库的配置信息,首先用下列命令新建 theanorc 文件vim ~/.theanorc
    写入以下信息:

    [global]
    device=cuda
    floatX=float32
    root=/usr/local/cuda-8.0   #cuda-8.0,视个人情况而定
    
    [nvcc]
    fastmath = True
    [blas]
    ldflags = -lopenblas
    
    [cuda]
    root = /usr/local/cuda-8.0
    

    ====================== 填坑过程 =========================

    一. 在zoe_env@218上:

    [fanyiyi_env@218 ~]$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python theano_gpu.py
    WARNING (theano.tensor.blas): Failed to import scipy.linalg.blas, and Theano flag blas.ldflags is empty. Falling back on slower implementations for dot(matrix, vector), dot(vector, matrix) and dot(vector, vector) (No module named scipy.linalg.blas)
    WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about ho
    ▽
    # .bashrc
    w to switch at this URL:
     https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
    
    ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
    [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
    Looping 1000 times took 1.82431101799 seconds
    Result is [1.2317803 1.6187934 1.5227807 ... 2.2077181 2.2996776 1.6232328]
    Used the cpu
    

    根据报错:“nvcc compiler not found on PATH. Check your nvcc installation and try again.”
    查看系统中是否安装了nvcc:$ nvcc -V
    出现提示:-bash: /usr/local/cuda-8.0/bin/nvcc: 权限不够
    解决方法:
    $ cd /usr/local/cuda-8.0/bin
    $ sudo chmod 755 nvcc

    之后,在此查看nvcc -V,出现nvcc的版本信息

    [zoe_env@218 ~]$ nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2016 NVIDIA Corporation
    Built on Tue_Jan_10_13:22:03_CST_2017
    Cuda compilation tools, release 8.0, V8.0.61
    

    表示在linux中安装了nvcc,下一步给nvcc添加路径
    $ vim ~/.bashrc
    在.bashrc中写入路径
    $ export PATH=/usr/local/cuda-8.0/bin/:$PATH
    $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH

    $ source ~/.bashrc
    nvcc 路径添加成功

    继续

    THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python theano_gpu.py
    
    #输出
    ===============================
    nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
    In file included from /usr/include/python2.7/pyconfig.h:6:0,
                     from /usr/include/python2.7/Python.h:8,
                     from mod.cu:3:
    /usr/include/python2.7/pyconfig-64.h:1188:0: 警告:“_POSIX_C_SOURCE”重定义 [默认启用]
     #define _POSIX_C_SOURCE 200112L
     ^
    In file included from /usr/local/cuda-8.0/bin/..//include/host_config.h:173:0,
                     from /usr/local/cuda-8.0/bin/..//include/cuda_runtime.h:78,
                     from <命令行>:0:
    /usr/include/features.h:168:0: 附注:这是先前定义的位置
     # define _POSIX_C_SOURCE 200809L
     ^
    In file included from /usr/include/python2.7/pyconfig.h:6:0,
                     from /usr/include/python2.7/Python.h:8,
                     from mod.cu:3:
    /usr/include/python2.7/pyconfig-64.h:1210:0: 警告:“_XOPEN_SOURCE”重定义 [默认启用]
     #define _XOPEN_SOURCE 600
     ^
    In file included from /usr/local/cuda-8.0/bin/..//include/host_config.h:173:0,
                     from /usr/local/cuda-8.0/bin/..//include/cuda_runtime.h:78,
                     from <命令行>:0:
    /usr/include/features.h:170:0: 附注:这是先前定义的位置
     # define _XOPEN_SOURCE 700
     ^
    mod.cu(940): warning: pointless comparison of unsigned integer with zero
    mod.cu(3000): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3003): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3005): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3008): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3010): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3013): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3016): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3019): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3021): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3024): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3026): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3029): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3031): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3034): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3037): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3040): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3042): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3045): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3047): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3050): warning: conversion from a string literal to "char *" is deprecated
    In file included from /usr/include/python2.7/pyconfig.h:6:0,
                     from /usr/include/python2.7/Python.h:8,
                     from mod.cu:3:
    /usr/include/python2.7/pyconfig-64.h:1188:0: 警告:“_POSIX_C_SOURCE”重定义 [默认启用]
     #define _POSIX_C_SOURCE 200112L
     ^
    In file included from /usr/local/cuda-8.0/bin/..//include/host_config.h:173:0,
                     from /usr/local/cuda-8.0/bin/..//include/cuda_runtime.h:78,
                     from <命令行>:0:
    /usr/include/features.h:168:0: 附注:这是先前定义的位置
     # define _POSIX_C_SOURCE 200809L
     ^
    In file included from /usr/include/python2.7/pyconfig.h:6:0,
                     from /usr/include/python2.7/Python.h:8,
                     from mod.cu:3:
    /usr/include/python2.7/pyconfig-64.h:1210:0: 警告:“_XOPEN_SOURCE”重定义 [默认启用]
     #define _XOPEN_SOURCE 600
     ^
    In file included from /usr/local/cuda-8.0/bin/..//include/host_config.h:173:0,
                     from /usr/local/cuda-8.0/bin/..//include/cuda_runtime.h:78,
                     from <命令行>:0:
    /usr/include/features.h:170:0: 附注:这是先前定义的位置
     # define _XOPEN_SOURCE 700
     ^
    mod.cu(940): warning: pointless comparison of unsigned integer with zero
    mod.cu(3000): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3003): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3005): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3008): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3010): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3013): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3016): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3019): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3021): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3024): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3026): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3029): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3031): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3034): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3037): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3040): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3042): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3045): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3047): warning: conversion from a string literal to "char *" is deprecated
    mod.cu(3050): warning: conversion from a string literal to "char *" is deprecated
    sh: /usr/local/cuda-8.0/bin/../nvvm/bin/cicc: 权限不够
    
    ['nvcc', '-shared', '-O3', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden', '-Xlinker', '-rpath,/home/zoe_env/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.2.1511-Core-x86_64-2.7.5-64/cuda_ndarray', '-I/usr/lib/python2.7/site-packages/theano/sandbox/cuda', '-I/usr/lib64/python2.7/site-packages/numpy/core/include', '-I/usr/include/python2.7', '-I/usr/lib/python2.7/site-packages/theano/gof', '-L/usr/lib64', '-o', '/home/zoe_env/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.2.1511-Core-x86_64-2.7.5-64/cuda_ndarray/cuda_ndarray.so', 'mod.cu', '-lcublas', '-lpython2.7', '-lcudart']
    ERROR (theano.sandbox.cuda): Failed to compile cuda_ndarray.cu: ('nvcc return status', 126, 'for cmd', 'nvcc -shared -O3 -m64 -Xcompiler -DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden -Xlinker -rpath,/home/zoe_env/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.2.1511-Core-x86_64-2.7.5-64/cuda_ndarray -I/usr/lib/python2.7/site-packages/theano/sandbox/cuda -I/usr/lib64/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -I/usr/lib/python2.7/site-packages/theano/gof -L/usr/lib64 -o /home/zoe_env/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-centos-7.2.1511-Core-x86_64-2.7.5-64/cuda_ndarray/cuda_ndarray.so mod.cu -lcublas -lpython2.7 -lcudart')
    ERROR (theano.gpuarray): pygpu was configured but could not be imported or is too old (version 0.6 or higher required)
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 21, in <module>
        import pygpu
    ImportError: No module named pygpu
    [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
    Looping 1000 times took 1.82314801216 seconds
    Result is [1.2317803 1.6187934 1.5227807 ... 2.2077181 2.2996776 1.6232328]
    Used the cpu
    

    根据上一步的提示“ImportError: No module named pygpu”

    安装pygpu

    >>>>>>>>>>>>>>>>>>>>>>>>尝试 1>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    [zoe_env@218 ~]$ pip install pygpu
    Downloading/unpacking pygpu
      Could not find any downloads that satisfy the requirement pygpu
    Cleaning up...
    No distributions at all found for pygpu
    Storing debug log for failure in /home/fanyiyi_env/.pip/pip.log
    
    >>>>>>>>>>>>>>>>>>>>>>>>尝试 2>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    查看pip的版本号,并且使用清华、豆瓣镜像进行安装都不成功:
    [zoe_env@218 ~]$ pip -V   #版本已更新,排除版本过时因素干扰
    pip 10.0.1 from /home/fanyiyi_env/anaconda2/lib/python2.7/site-packages/pip (python 2.7)     
    [zoe_env@218 ~]$ pip install pygpu  -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
    Looking in indexes: http://pypi.douban.com/simple/
    Collecting pygpu
      Could not find a version that satisfies the requirement pygpu (from versions: )
    No matching distribution found for pygpu
    
    >>>>>>>>>>>>>>>>>>>>>>>>尝试 3>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    ping 检查网络,链接中,没有断网
    >>>>>>>>>>>>>>>>>>>>>>>>尝试 4>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    conda remove pygpu   #删除原来的pygpu
    conda install -c conda-forge pygpu。#重新安装
    

    import pygpu 显示pygpu安装成功,喜极而泣😹

    ******************* conda-forge 知识点补充介绍 *******************
    conda-forge是可以安装软件包的附加渠道。从这个意义上说,它不是比默认频道或其他数百(数千个)频道中人们发布套餐所特有的更特别的。如果在https://anaconda.org注册并上传自己的Conda软件包,可以添加自己的频道。
    有两种方法可以更改频道的选项。一种是每次安装包时指定一个频道:
    conda install -c some-channel packagename
    当然,该套件必须存在于该频道上。如果经常使用相同的频道,那么可能需要将其添加到配置中。你可以写
    conda config --add channels some-channel
    将通道添加some-channelchannels配置列表的顶部。这给出some-channel了最高优先级(当多个通道具有特定包时,优先级(部分地)确定选择哪个通道)。要将频道添加到列表的末尾并为其指定最低优先级,请键入
    conda config --append channels some-channel
    如果您想要删除添加的频道,可以通过书写来完成
    conda config --remove channels some-channel
    看到conda config -h获得更多选择。

    综上所述,使用该conda-forge频道取代defaultsContinuum维护的频道有三个主要原因:

    1. conda-forge 可能比defaults频道更新
    2. conda-forge频道上有不可用的软件包defaults
    3. 更愿意使用依赖项,如openblas(from conda-forge)而不是mkl(from defaults)。

    安装好pygpu后,安装theano

    >>> import pygpu      #显示安装pygpu,成功
    >>> import theano
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named theano
    >>> exit()
    
    #安装theano
    conda install theano
    #输出
    Solving environment: done
    
    ## Package Plan ##
    
      environment location: /home/zoe_env/anaconda2
    
      added / updated specs:
        - theano
    
    
    The following packages will be downloaded:
    
        package                    |            build
        ---------------------------|-----------------
        conda-4.5.5                |           py27_0         1.0 MB
        certifi-2018.4.16          |           py27_0         142 KB
        gcc_linux-64-7.2.0         |      h550dcbe_27           9 KB
        gxx_impl_linux-64-7.2.0    |       hdf63c60_3        18.6 MB
        gxx_linux-64-7.2.0         |      h550dcbe_27           8 KB
        theano-1.0.2               |   py27h6bb024c_0         3.6 MB
        binutils_linux-64-7.2.0    |      had2808c_27           8 KB
        gcc_impl_linux-64-7.2.0    |       habb00fd_3        72.4 MB
        binutils_impl_linux-64-2.28.1|       had2808c_3        16.1 MB
        ------------------------------------------------------------
                                               Total:       111.9 MB
    
    The following NEW packages will be INSTALLED:
    
        binutils_impl_linux-64: 2.28.1-had2808c_3
        binutils_linux-64:      7.2.0-had2808c_27
        gcc_impl_linux-64:      7.2.0-habb00fd_3
        gcc_linux-64:           7.2.0-h550dcbe_27
        gxx_impl_linux-64:      7.2.0-hdf63c60_3
        gxx_linux-64:           7.2.0-h550dcbe_27
        theano:                 1.0.2-py27h6bb024c_0
    
    The following packages will be UPDATED:
    
        certifi:                2018.4.16-py27_0     conda-forge --> 2018.4.16-py27_0
    
    The following packages will be DOWNGRADED:
    
        conda:                  4.5.7-py27_0         conda-forge --> 4.5.5-py27_0
    
    Proceed ([y]/n)? y
    
    
    Downloading and Extracting Packages
    conda-4.5.5          |  1.0 MB | ############################################################################################################################################################### | 100%
    certifi-2018.4.16    |  142 KB | ############################################################################################################################################################### | 100%
    gcc_linux-64-7.2.0   |    9 KB | ############################################################################################################################################################### | 100%
    gxx_impl_linux-64-7. | 18.6 MB | ############################################################################################################################################################### | 100%
    gxx_linux-64-7.2.0   |    8 KB | ############################################################################################################################################################### | 100%
    theano-1.0.2         |  3.6 MB | ############################################################################################################################################################### | 100%
    binutils_linux-64-7. |    8 KB | ############################################################################################################################################################### | 100%
    gcc_impl_linux-64-7. | 72.4 MB | ############################################################################################################################################################### | 100%
    binutils_impl_linux- | 16.1 MB | ############################################################################################################################################################### | 100%
    Preparing transaction: done
    Verifying transaction: done
    Executing transaction: done
    #显示theano安装成功
    >>> import theano #验证成功
    >>> exit()
    

    theano和pygpu安装成功,theano==1.0.2,pygpu==0.7.0

    重头大戏——theano调用gpu

    [zoe_env@218 ~]$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python theano_gpu.py
    Traceback (most recent call last):
      File "theano_gpu.py", line 1, in <module>
        from theano import function, config, shared, sandbox
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/__init__.py", line 88, in <module>
        from theano.configdefaults import config
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/configdefaults.py", line 137, in <module>
        in_c_key=False)
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/configparser.py", line 287, in AddConfigVar
        configparam.__get__(root, type(root), delete_key=True)
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/configparser.py", line 335, in __get__
        self.__set__(cls, val_str)
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/configparser.py", line 346, in __set__
        self.val = self.filter(val)
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/configdefaults.py", line 116, in filter
        'You are tring to use the old GPU back-end. '
    ValueError: You are tring to use the old GPU back-end. It was removed from Theano. Use device=cuda* now. See https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29 for more information.
    

    出现错误的原因:Use device=cuda*,将gpu 更改为cuda(theano更新之后,书写方式发生变化)

    [zoe_env@218 ~]$ THEANO_FLAGS=mode=FAST_RUN,device=cuda,floatX=float32 python theano_gpu.py
    Traceback (most recent call last):
      File "theano_gpu.py", line 1, in <module>
        from theano import function, config, shared, sandbox
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/__init__.py", line 156, in <module>
        import theano.gpuarray
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 33, in <module>
        from . import fft, dnn, opt, extra_ops, multinomial, reduction, sort, rng_mrg, ctc
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/fft.py", line 14, in <module>
        from .opt import register_opt, op_lifter, register_opt2
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/opt.py", line 2801, in <module>
        from .dnn import (local_abstractconv_cudnn,
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 339, in <module>
        handle_type = CUDNNDataType('cudnnHandle_t', 'cudnnDestroy')
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 259, in CUDNNDataType
        version=version(raises=False))
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 319, in version
        if not dnn_present():
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 209, in dnn_present
        dnn_present.avail, dnn_present.msg = _dnn_check_version()
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 180, in _dnn_check_version
        v = version()
      File "/home/zoe_env/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 334, in version
        "while the library is version %s." % v)
    RuntimeError: Mixed dnn version. The header is version 5110 while the library is version 6021.
    

    (被困了好久大坑)RuntimeError: Mixed dnn version. The header is version 5110 while the library is version 6021.

    问题原因:cudnn的版本不匹配的问题,在cudnn.h的文件中cudnn的版本号为5.1.10,而在调用的时候,调用的是cudnn6.0.21

    寻找错误原因的过程如下:
    [zoe_env@218] cd  /usr/local/cuda-8.0
    [zoe_env@218] ls
    bin  doc  extras  include  jre  lib64  libnsight  libnvvp  nvml  nvvm  pkgconfig  samples  share  src  tools  version.txt
    [zoe_env@218] cat include/cudnn.h       #查看cudnn的版本号
    #define CUDNN_MAJOR      5
    #define CUDNN_MINOR      1
    #define CUDNN_PATCHLEVEL 10
    #define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
    #define CUDNN_VERSION=5110
    
    [zoe_env@218 cuda-8.0]$ cd lib64
    [zoe_env@218 lib64]$ ls
    cudnn_bak            libcudnn.so.5.1.10   libcuinj64.so.8.0      libcusparse.so.8.0.61  libnppicom.so.8.0     libnppim.so.8.0.61   libnppitc.so.8.0.61   libnvrtc-builtins.so.8.0
    libcublas_device.a   libcudnn.so.6        libcuinj64.so.8.0.61   libcusparse_static.a   libnppicom.so.8.0.61  libnppi.so           libnpps.so            libnvrtc-builtins.so.8.0.61
    libcublas.so         libcudnn.so.6.0.21   libculibos.a           libnppc.so             libnppidei.so         libnppi.so.8.0       libnpps.so.8.0        libnvrtc.so
    libcublas.so.8.0     libcudnn_static.a    libcurand.so           libnppc.so.8.0         libnppidei.so.8.0     libnppi.so.8.0.61    libnpps.so.8.0.61     libnvrtc.so.8.0
    libcublas.so.8.0.61  libcufft.so          libcurand.so.8.0       libnppc.so.8.0.61      libnppidei.so.8.0.61  libnppi_static.a     libnpps_static.a      libnvrtc.so.8.0.61
    libcublas_static.a   libcufft.so.8.0      libcurand.so.8.0.61    libnppc_static.a       libnppif.so           libnppist.so         libnvblas.so          libnvToolsExt.so
    libcudadevrt.a       libcufft.so.8.0.61   libcurand_static.a     libnppial.so           libnppif.so.8.0       libnppist.so.8.0     libnvblas.so.8.0      libnvToolsExt.so.1
    libcudart.so         libcufft_static.a    libcusolver.so         libnppial.so.8.0       libnppif.so.8.0.61    libnppist.so.8.0.61  libnvblas.so.8.0.61   libnvToolsExt.so.1.0.0
    libcudart.so.8.0     libcufftw.so         libcusolver.so.8.0     libnppial.so.8.0.61    libnppig.so           libnppisu.so         libnvgraph.so         libOpenCL.so
    libcudart.so.8.0.61  libcufftw.so.8.0     libcusolver.so.8.0.61  libnppicc.so           libnppig.so.8.0       libnppisu.so.8.0     libnvgraph.so.8.0     libOpenCL.so.1
    libcudart_static.a   libcufftw.so.8.0.61  libcusolver_static.a   libnppicc.so.8.0       libnppig.so.8.0.61    libnppisu.so.8.0.61  libnvgraph.so.8.0.61  libOpenCL.so.1.0
    libcudnn.so          libcufftw_static.a   libcusparse.so         libnppicc.so.8.0.61    libnppim.so           libnppitc.so         libnvgraph_static.a   libOpenCL.so.1.0.0
    libcudnn.so.5        libcuinj64.so        libcusparse.so.8.0     libnppicom.so          libnppim.so.8.0       libnppitc.so.8.0     libnvrtc-builtins.so  stubs
    

    可以看出,在lib64的目录下,存在着libcudnn.so.5.1.10和libcudnn.so.6.0.21两个文件,系统在调用的时候,调用的是6.0.21的(将6.0.21文件重命名之后,运行程序,出现的错误ImportError: ('The following error happened while compiling the node', DnnVersion(), '\n', 'libcudnn.so.6: cannot open shared object file: No such file or directory', '[DnnVersion()]'))
    至此,找到问题产生的原因:是由于两个cudnn的版本不一致。

    解决方法:安装CUDNN6.0.10

    1.下载cudnn6.0

    本次示例安装的是cudnn6.0,对应的cuda版本是8.0,先从官网上下载需要的安装包:cudnn-8.0-linux-x64-v6.0.tgz
    cudnn-8.0-linux-x64-v6.0.tgz百度云链接
    然后对其进行解压处理: tar -zxvf cudnn-8.0-linux-x64-v6.0.tgz

    解压后得到一个cuda文件夹,然后进入得到的cuda文件夹,复制一些东西到我们之前安装cuda的路径下(/usr/local/cuda-8.0/)
    我们可以现看看cuda里面有什么:一个include文件夹,和一个lib64文件夹

    2.进入该文件夹后拷贝一些东西到指定目录

    $cd cuda
    $sudo cp lib64/lib* /usr/local/cuda/lib64/
    $sudo cp include/cudnn.h /usr/local/cuda/include/

    3.进入 /usr/local/cuda/lib64/ 路径下,接下来更新cuDNN库文件的软连接,命令如下:

    $ sudo chmod +r libcudnn.so.6.0.21
    $ sudo ln -sf libcudnn.so.6.0.21 libcudnn.so.6
    $ sudo ln -sf libcudnn.so.6 libcudnn.so
    $ sudo ldconfig
    然后就ok了!

    4.验证:

    (1)现在检查CUDNN版本,发现已经是6.0.21

    $ cat /usr/local/cuda-8.0/include/cudnn.h |grep CUDNN_MAJOR -A 2
    #define CUDNN_MAJOR      6
    #define CUDNN_MINOR      0
    #define CUDNN_PATCHLEVEL 21
    #define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
    

    (2) python theano_gpu.py 验证

    [zoe_env@218 ~]$ THEANO_FLAGS=mode=FAST_RUN,device=cuda0,floatX=float32 python theano_gpu.py
    Using cuDNN version 6021 on context None
    Mapped name None to device cuda0: GeForce GTX 1080 Ti (0000:17:00.0)
    [GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float32, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
    Looping 1000 times took 0.24077296257 seconds
    Result is [1.2317803 1.6187935 1.5227807 ... 2.2077181 2.2996776 1.623233 ]
    Used the cpu
    

    问题产生原因:没有使用最新版本的运行脚本
    解决方法:更换最新的脚本Testing Theano with GPU。之后再次,运行脚本。

    $THEANO_FLAGS=mode=FAST_RUN,device=cuda0,floatX=float32 python theano_gpu.py
    Using cuDNN version 6021 on context None
    Mapped name None to device cuda0: GeForce GTX 1080 Ti (0000:17:00.0)
    [GpuElemwise{exp,no_inplace}(<GpuArrayType<None>(float64, vector)>), HostFromGpu(gpuarray)(GpuElemwise{exp,no_inplace}.0)]
    Looping 1000 times took 0.706813 seconds
    Result is [1.23178032 1.61879341 1.52278065 ... 2.20771815 2.29967753 1.62323285]
    Used the gpu
    

    终于填了一个theano-gpu的大坑!!!!!普天同庆,完结散花🌹🌹🌹

    二. 在zoe@218上运行的时候出现提示:

    >>> import theano
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/__init__.py", line 80, in <module>
        from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/scan_module/__init__.py", line 41, in <module>
        from theano.scan_module import scan_opt
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/scan_module/scan_opt.py", line 60, in <module>
        from theano import tensor, scalar
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/tensor/__init__.py", line 9, in <module>
        from theano.tensor.subtensor import *
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/tensor/subtensor.py", line 27, in <module>
        from cutils_ext.cutils_ext import inplace_increment
    ImportError: cannot import name inplace_increment
    

    百度:需要清除缓存
    解法:
    "theano-cache clear" or "theano-cache purge".
    If neither of those work, removing the cache directory with "rm -rf ~/.theano"

    解法一:[zoe@218 ~]$ theano-cache purge
    Traceback (most recent call last):
      File "/home/zoe/anaconda2/bin/theano-cache", line 7, in <module>
        from bin.theano_cache import main
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/bin/theano_cache.py", line 15, in <module>
        import theano
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/__init__.py", line 80, in <module>
        from theano.scan_module import (scan, map, reduce, foldl, foldr, clone,
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/scan_module/__init__.py", line 41, in <module>
        from theano.scan_module import scan_opt
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/scan_module/scan_opt.py", line 60, in <module>
        from theano import tensor, scalar
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/tensor/__init__.py", line 9, in <module>
        from theano.tensor.subtensor import *
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/tensor/subtensor.py", line 27, in <module>
        from cutils_ext.cutils_ext import inplace_increment
    ImportError: cannot import name inplace_increment
    

    解法二:# Theano ImportError: cannot import name inplace_increment
    rm -rf ~/.theano
    之后,在此python import theano 依旧出现这个错误。
    发现在运行的时候,根本没有出现.theano文件,这是为什么??

    好吧,在账户下的external文件夹下看到.theano文件,执行rm -rf ~/.theano,之后出现:

    [zoe@218 external]$ python
    Python 2.7.15 |Anaconda, Inc.| (default, May  1 2018, 23:32:55)
    [GCC 7.2.0] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import theano
    /home/zoe/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py:135: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to version 5.1.
      warnings.warn("Your cuDNN version is more recent than "
    ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
    Traceback (most recent call last):
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 164, in <module>
        use(config.device)
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 151, in use
        init_dev(device)
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 68, in init_dev
        context.cudnn_handle = dnn._make_handle(context)
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 80, in _make_handle
        cudnn = _dnn_lib()
      File "/home/zoe/anaconda2/lib/python2.7/site-packages/theano/gpuarray/dnn.py", line 67, in _dnn_lib
        raise RuntimeError('Could not find cudnn library (looked for v5[.1])')
    RuntimeError: Could not find cudnn library (looked for v5[.1])
    >>> exit()
    [zoe@218 external]$ cat /usr/local/cuda-8.0/include/cudnn.h |grep CUDNN_MAJOR -A 2
    #define CUDNN_MAJOR      6
    #define CUDNN_MINOR      0
    #define CUDNN_PATCHLEVEL 21
    --
    #define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
    
    #include "driver_types.h"
    

    报错是,找不到cudnn5.1.10,因为在本服务器下的另一个账户zoe_env下,将cudnn设置为了6.0.21(即填的坑1)。当前的cudnn的版本比theano(0.9.0)的版本新,不匹配。
    更新theano版本0.9.0——>1.0.2
    $ pip install theano==1.0.2
    再次,import theano显示:ValueError: Your installed version of pygpu(0.6.9) is too old, please upgrade to 0.7.0 or later (but below 0.8.0)
    $ conda install -c conda-forge pygpu
    提示:

    >>> import theano
    ERROR (theano.gpuarray): pygpu was configured but could not be imported or is too old (version 0.6 or higher required)
    Traceback (most recent call last):
      File "/home/fanyiyi/anaconda2/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 23, in <module>
        import pygpu.version
    ImportError: No module named version
    

    问题原因:应该先移除原来的pygpu版本,在重新安装更新版本pygpu
    安装完成之后,再次import theano提示:没有theano,但是使用pip list查看有Theano存在,重新安装conda install theano
    之后,再次import theano 显示成功调用gpu

    >>> import theano
    Using cuDNN version 6021 on context None
    Mapped name None to device cuda0: GeForce GTX 1080 Ti (0000:17:00.0)
    

    又填完一个坑,满心欢喜!!


    三. 在zoe@216上:

    [zoe@216 Documents]$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python theano_gpu
    WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
     https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
    
    Using gpu device 0: GeForce GTX 1080 Ti (CNMeM is disabled, cuDNN Mixed dnn version. The header is from one version, but we link with a different version (5110, 6021))
    [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
    ('Looping 1000 times took', 0.3158299922943115, 'seconds')
    ('Result is', array([ 1.23178029,  1.61879349,  1.52278066, ...,  2.20771813,
            2.29967761,  1.62323296], dtype=float32))
    Used the gpu
    #但是。。。
    [zoe@216 ~]$ python
    Python 2.7.13 |Anaconda, Inc.| (default, Sep 30 2017, 18:12:43)
    [GCC 7.2.0] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pygpu
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: No module named pygpu
    >>> import theano
    WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10).  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
     https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29
    
    Using gpu device 0: GeForce GTX 1080 Ti (CNMeM is disabled, cuDNN Mixed dnn version. The header is from one version, but we link with a different version (5110, 6021))
    #这是为什么?为什么没有pygpu?还可以调用起来gpu 呢?
    #查看 theano 版本
    >>> theano.__version__
    '0.9.0'
    

    总结

    theano版本和cudnn的版本是相互匹配的Theano 文档原话

    cuDNN v5.1 is supported in Theano master version. So it dropped cuDNN v3 support. Theano 0.8.0 and 0.8.1 support only cuDNN v3 and v4. Theano 0.8.2 will support only v4 and v5.
    Request cuDNN 5 and Theano 0.9dev2 or more recent.
    在theano1.0.2之后,与之相匹配的版本为cudnn6.0

    哎!Theano已经停止开发了,不知道以后会怎么样!


    参考文章:
    Theano
    NVIDIA CuDNN 安装说明
    CentOS 7 Tensorflow-GPU 安装遇到的坑记录
    I have set devices=cuda0, but the program still use cpu

    相关文章

      网友评论

        本文标题:Theano坑--无法调用GPU

        本文链接:https://www.haomeiwen.com/subject/sdevuftx.html