TensorFlow —— GPU版安装指南

作者: cathyxlyl | 来源:发表于2017-04-17 23:02 被阅读8416次

    0x00 前言

    CPU版的TensorFlow安装还是十分简单的,也就是几条命令的时,但是GPU版的安装起来就会有不少的坑。在这里总结一下整个安装步骤,以及在安装过程中遇到的问题和解决方法。

    整体梳理

    安装GPU版的TensorFlow和CPU版稍微有一些区别,这里先做一个简单的梳理,后面有详细的安装过程。

    1. Python
    2. NVIDIA Cuda
    3. cuDNN
    4. TensorFlow
    5. 测试

    0x01 安装Python

    这里有两种安装的方法:

    • 安装基本的Python环境,需要什么再继续安装。
    • 安装Anaconda,基本上能用到的包都有包含。

    笔者都是直接安装Anaconda了,省事,直接在官网下载安装即可,没什么写的。

    下面给出安装基本Python环境的命令。

    apt-get install python-pip python-dev python-virtualenv 
    

    0x02 安装NVIDIA Cuda

    整体参考说明:http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile

    安装Cuda主要有下面几个步骤:

    • 确认电脑的显卡支持cuda
    • 确认Linux版本是否支持cuda
    • 确认gcc是否安装
    • 确认kernel版本
    • 禁用开源驱动
    • 关闭x server
    • 下载cuda
    • 安装cuda

    前面几个步骤,主要是做各种前置条件验证的。本文的操作基于Ubuntu16.04,而且电脑本来就是双显卡,基本没什么问题。

    主要的坑在安装cuda

    1. 验证安装环境

    友情提示: 这几步笔者基本上就跳过了,基本上没什么问题,感兴趣或者对自己系统不了解的可以验证一下。

    • 确认电脑的显卡支持cuda
    • 确认Linux版本是否支持cuda
    • 确认gcc是否安装
    • 确认kernel版本
    2.1. Verify You Have a CUDA-Capable GPU
    To verify that your GPU is CUDA-capable, go to your distribution's equivalent of System Properties, or, from the command line, enter:
    
    $ lspci | grep -i nvidia
    If you do not see any settings, update the PCI hardware database that Linux maintains by entering update-pciids (generally found in /sbin) at the command line and rerun the previous lspci command.
    
    If your graphics card is from NVIDIA and it is listed in http://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.
    
    The Release Notes for the CUDA Toolkit also contain a list of supported products.
    
    2.2. Verify You Have a Supported Version of Linux
    The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes.
    
    To determine which distribution and release number you're running, type the following at the command line:
    
    $ uname -m && cat /etc/*release
    You should see output similar to the following, modified for your particular system:
    
    x86_64
    Red Hat Enterprise Linux Workstation release 6.0 (Santiago)
    The x86_64 line indicates you are running on a 64-bit system. The remainder gives information about your distribution.
    
    2.3. Verify the System Has gcc Installed
    The gcc compiler is required for development using the CUDA Toolkit. It is not required for running CUDA applications. It is generally installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly.
    
    To verify the version of gcc installed on your system, type the following on the command line:
    
    $ gcc --version
    If an error message displays, you need to install the development tools from your Linux distribution or obtain a version of gcc and its accompanying toolchain from the Web.
    
    2.4. Verify the System has the Correct Kernel Headers and Development Packages Installed
    The CUDA Driver requires that the kernel headers and development packages for the running version of the kernel be installed at the time of the driver installation, as well whenever the driver is rebuilt. For example, if your system is running kernel version 3.17.4-301, the 3.17.4-301 kernel headers and development packages must also be installed.
    
    While the Runfile installation performs no package validation, the RPM and Deb installations of the driver will make an attempt to install the kernel header and development packages if no version of these packages is currently installed. However, it will install the latest version of these packages, which may or may not match the version of the kernel your system is using. Therefore, it is best to manually ensure the correct version of the kernel headers and development packages are installed prior to installing the CUDA Drivers, as well as whenever you change the kernel version.
    
    The version of the kernel your system is running can be found by running the following command:
    $ uname -r
    This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers. This command will be used multiple times below to specify the version of the packages to install. Note that below are the common-case scenarios for kernel usage. More advanced cases, such as custom kernel branches, should ensure that their kernel headers and sources match the kernel build they are running.
    RHEL/CentOS
    The kernel headers and development packages for the currently running kernel can be installed with:
    $ sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
    Fedora
    The kernel headers and development packages for the currently running kernel can be installed with:
    $ sudo dnf install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
    OpenSUSE/SLES
    Use the output of the uname command to determine the running kernel's version and variant:
    $ uname -r
    3.16.6-2-default
    In this example, the version is 3.16.6-2 and the variant is default. The kernel headers and development packages can then be installed with the following command, replacing <variant> and <version> with the variant and version discovered from the previous uname command:
    $ sudo zypper install kernel-<variant>-devel=<version>
    Ubuntu
    The kernel headers and development packages for the currently running kernel can be installed with:
    $ sudo apt-get install linux-headers-$(uname -r)
    
    Read more at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ixzz4b1TabLmw 
    Follow us: @GPUComputing on Twitter | NVIDIA on Facebook
    

    2. 禁用开源驱动

    注意:有些教程会有更多需要禁用的驱动,这里遵循官网的说明,没有问题。

    新建一个文件

    vim /etc/modprobe.d/blacklist-nouveau.conf
    

    文件内容为

    blacklist nouveau
    options nouveau modeset=0
    

    update一下

    sudo update-initramfs -u
    

    3. 关闭X server

    当我们安装 NVIDIA 的驱动程序时,需要先关闭 X server,关闭的方式有两种:

    • 关闭gdm
    • 关闭lightdm

    第一种不行时,尝试用第二种。 本文使用的是第二种关闭方式。

    方法1
    sudo /etc/init.d/gdm stop
    sudo /etc/init.d/gdm status
    
    方法2
    sudo /etc/init.d/lightdm stop
    sudo /etc/init.d/lightdm status
    

    注意:后面显卡驱动程序安装完毕后,应首先重启 gdm 或者 lightdm。

    4. 下载cuda

    官网下载即可。 目前这种方式是最靠谱的。
    https://developer.nvidia.com/cuda-downloads

    注意:这里在选择下载文件时选择的是.run后缀的文件。用别的遇到了一些坑,最后发现这个最稳定。

    Paste_Image.png

    5. 安装cuda

    注意:安装过程中有坑,请重视下面的说明,否则安装后的ubuntu可能会出现无法进入图形界面的情况。因为这个原因,笔者重做了一晚上的操作系统,尝试了3个Ubuntu的版本和两个Centos的版本。

    其它的步骤,跟着说明点就行,主要注意图中框的地方。

    Paste_Image.png

    安装cuda时一定不要安装OpenGL;切记,否则有可能在安装完之后无法启动图形化桌面。

    Paste_Image.png

    显卡驱动程序安装完毕后,应首先重启 gdm 或者 lightdm。

    6. 添加环境变量

    在官网里面需要配置环境变量。

    在terminal根目录中输入以下命令:
    $ sudo vim ~/.bash_profile
    
    在打开的文本末尾加入:
    export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
    export CUDA_HOME=/usr/local/cuda
    

    0x03 安装cuDNN

    下面是下载地址,需要提前注册。 注册一下就好。

    注意下载的版本。

    https://developer.nvidia.com/rdp/cudnn-download

    Paste_Image.png

    下载下来的文件就deb包,直接dpkg -i 安装即可。

    0x04 安装TensorFlow

    Tensorflow官网中有很详细的说明。笔者建议无论是Anaconda还是原生Python环境,都使用pip安装,最为简便,版本也很新。

    pip install tensorflow
    

    另外,如果使用Anaconda的conda安装,有一个好处就是可以为Tensorflow单独建一个虚拟环境,但要注意输入正确的Tensorflow包地址(gpu还是cpu版本、操作系统、Python版本等)。
    https://storage.googleapis.com/tensorflow/里有所有Tensorflow包的列表(XML格式)。

    建一个虚拟环境
    $ conda create -n tensorflow
    
    Activate虚拟环境
    $ source activate tensorflow
     (tensorflow)$  # Your prompt should change 
    
    安装tensorflow
     (tensorflow)$ pip install --ignore-installed --upgrade \
     https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-1.0.1-cp27-cp27m-linux_x86_64.whl
    
    

    0x05 验证安装

    这里跑一个小例子来验证一下。

    $ python
    Then, enter the following short program inside the python interactive shell:
    
    >>> import tensorflow as tf
    >>> hello = tf.constant('Hello, TensorFlow!')
    >>> sess = tf.Session()
    >>> print(sess.run(hello))
    If the system outputs the following, then you are ready to begin running TensorFlow programs:
    
    Hello, TensorFlow!
    

    作者:cathyxlyl | 简书 | GITHUB

    个人主页:http://cathyxlyl.github.io/
    文章可以转载, 但必须以超链接形式标明文章原始出处和作者信息

    相关文章

      网友评论

      • yscyh:请问安装cudnn时那三个文件都要装是吧,我现在安装cudnn 9.2
        这是文件列表:
        cuDNN v7.1.4 Library for Linux

        cuDNN v7.1.4 Library for Linux (Power8/Power9)
        cuDNN v7.1.4 Runtime Library for Ubuntu16.04 & Power8 (Deb)

        cuDNN v7.1.4 Developer Library for Ubuntu16.04 & Power8 (Deb)
        cuDNN v7.1.4 Runtime Library for Ubuntu14.04 (Deb)

        cuDNN v7.1.4 Developer Library for Ubuntu14.04 (Deb)
        应该怎么选择,麻烦了,谢谢
      • eaa618494701:感谢! 我的情况是:没装OpenGL会导致界面很卡,拖拽窗口什么的不流畅,装了之后就流畅多了。。。:joy:
      • cy飞翔的但丁cy:很强势!

      本文标题:TensorFlow —— GPU版安装指南

      本文链接:https://www.haomeiwen.com/subject/cupxzttx.html