美文网首页易 AI
AI - Ubuntu 机器学习环境 (TensorFlow G

AI - Ubuntu 机器学习环境 (TensorFlow G

作者: CatchZeng | 来源:发表于2021-06-09 09:21 被阅读0次

    原文:https://makeoptim.com/deep-learning/tensorflow-gpu-on-ubuntu

    介绍

    • Ubuntu 18.04.5 LTS
    • GTX 1070
    • TensorFlow 2.4.1

    所需软件

    安装前

    GCC

    $ gcc --version
    Command 'gcc' not found, but can be installed with:
    sudo apt install gcc
    $ sudo apt install gcc
    $ gcc --version
    gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    Copyright (C) 2017 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    

    NVIDIA package repositories

    $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
    $ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
    $ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    $ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
    $ sudo apt-get update
    

    NVIDIA machine learning

    $ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
    
    $ sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
    $ sudo apt-get update
    

    NVIDIA GPU driver

    $ sudo apt-get install --no-install-recommends nvidia-driver-460
    

    注:这里需要使用 460 版本,TensorFlow 官网写的是 450,实测失败。

    重启并使用以下命令检查 GPU 是否可见。

    $ nvidia-smi
    Mon Apr  5 16:17:17 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 1070    On   | 00000000:01:00.0  On |                  N/A |
    |  0%   48C    P8     9W / 180W |    351MiB /  8111MiB |      1%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A       997      G   /usr/lib/xorg/Xorg                 18MiB |
    |    0   N/A  N/A      1145      G   /usr/bin/gnome-shell               53MiB |
    |    0   N/A  N/A      1353      G   /usr/lib/xorg/Xorg                108MiB |
    |    0   N/A  N/A      1495      G   /usr/bin/gnome-shell               83MiB |
    |    0   N/A  N/A      1862      G   ...AAAAAAAAA= --shared-files       82MiB |
    +-----------------------------------------------------------------------------+
    

    CUDA ToolKit and cuDNN

    $ wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
    $ sudo apt install ./libnvinfer7_7.1.3-1+cuda11.0_amd64.deb
    $ sudo apt-get update
    
    # Install development and runtime libraries (~4GB)
    $ sudo apt-get install --no-install-recommends \
        cuda-11-0 \
        libcudnn8=8.0.4.30-1+cuda11.0  \
        libcudnn8-dev=8.0.4.30-1+cuda11.0
    

    TensorRT

    $ sudo apt-get install -y --no-install-recommends libnvinfer7=7.1.3-1+cuda11.0 \
        libnvinfer-dev=7.1.3-1+cuda11.0 \
        libnvinfer-plugin7=7.1.3-1+cuda11.0
    

    Miniconda

    https://docs.conda.io/en/latest/miniconda.html 下载 Python 3.8 安装脚本。

    image

    增加可执行权限

    $ chmod +x Miniconda3-latest-Linux-x86_64.sh
    

    执行安装脚本

    $ ./Miniconda3-latest-Linux-x86_64.sh
    

    重启终端,激活 conda。

    虚拟环境

    创建一个名称为 tensorflow 的虚拟环境。

    $ conda create -n tensorflow python=3.8.5
    $ conda activate tensorflow
    

    安装 TensorFlow

    $ pip install tensorflow==2.4.1
    

    验证安装

    $ python -c "import tensorflow as tf;print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"
    2021-04-05 16:20:00.426536: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    2021-04-05 16:20:01.170305: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
    2021-04-05 16:20:01.170830: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
    2021-04-05 16:20:01.198917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-04-05 16:20:01.199497: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
    pciBusID: 0000:01:00.0 name: GeForce GTX 1070 computeCapability: 6.1
    coreClock: 1.7845GHz coreCount: 15 deviceMemorySize: 7.92GiB deviceMemoryBandwidth: 238.66GiB/s
    2021-04-05 16:20:01.199519: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
    2021-04-05 16:20:01.201250: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
    2021-04-05 16:20:01.201278: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
    2021-04-05 16:20:01.201995: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
    2021-04-05 16:20:01.202159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
    2021-04-05 16:20:01.203993: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
    2021-04-05 16:20:01.204412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
    2021-04-05 16:20:01.204499: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
    2021-04-05 16:20:01.204566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-04-05 16:20:01.204897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2021-04-05 16:20:01.205168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
    Num GPUs Available:  1
    

    安装 JupyterLab 和 matplotlib

    $ pip install jupyterlab matplotlib
    

    在 JupyterLab 中运行 TensorFlow

    $ jupyter lab
    

    JupyterLab 将自动在浏览器打开。

    https://www.tensorflow.org/tutorials/images/cnn 下载并导入 CNN notebook。

    image

    执行 Restart Kernel and Run All Cells

    image

    当训练开始, 检查 GPU 进程,可以看到 ...nvs/tensorflow/bin/python 表示正在使用 GPU 训练模型。

    image
    $ nvidia-smi
    Mon Apr  5 16:36:28 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 1070    On   | 00000000:01:00.0  On |                  N/A |
    | 23%   54C    P2    72W / 180W |   7896MiB /  8111MiB |     55%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A       997      G   /usr/lib/xorg/Xorg                 18MiB |
    |    0   N/A  N/A      1145      G   /usr/bin/gnome-shell               73MiB |
    |    0   N/A  N/A      1353      G   /usr/lib/xorg/Xorg                136MiB |
    |    0   N/A  N/A      1495      G   /usr/bin/gnome-shell               53MiB |
    |    0   N/A  N/A      1862      G   ...AAAAAAAAA= --shared-files       99MiB |
    |    0   N/A  N/A      3181      C   ...nvs/tensorflow/bin/python     7507MiB |
    +-----------------------------------------------------------------------------+
    

    安装 VSCode

    前往官网下载并安装 VSCode

    打开 VSCode 并安装 Python 支持。

    image

    选择某个文件夹(这里以 ~/tensorflow-notebook/01-hello 为例),新建文件 hello.ipynb

    import tensorflow as tf
    hello = tf.constant('Hello, TensorFlow!')
    hello.numpy()
    

    使用 VSCode 打开刚才创建的 ~/tensorflow-notebook/01-hello/hello.ipynb,并选择 Python 为创建的虚拟环境。

    image

    VSCode 运行 TensorFlow

    image

    小结

    至此,开发环境已经搭建完毕。大家可以根据自己的习惯,选择使用命令行、JupyterLab 或者 VSCode 进行开发。

    延伸阅读

    参考链接

    相关文章

      网友评论

        本文标题:AI - Ubuntu 机器学习环境 (TensorFlow G

        本文链接:https://www.haomeiwen.com/subject/xrnseltx.html