美文网首页我爱编程
从硬件配置到软件安装,一台深度学习机器的配备指南

从硬件配置到软件安装,一台深度学习机器的配备指南

作者: RocWay | 来源:发表于2017-06-27 10:15 被阅读0次

    转发自炼数成金

    从硬件配置到软件安装,一台深度学习机器的配备指南
    2016-9-23 12:23| 发布者: 炼数成金_小数| 查看: 33286| 评论: 1|原作者: Roelof Pieters|来自: 机器之心

    摘要: 一旦我决定了搭建我自己的 GPU 系统时,我首先想到的是:为什么要这么麻烦自己去搭建一个呢,英伟达不是刚发布了其强大的 DevBox 吗,而且还可能有其它供应商也在为深度学习应用做同样的事?浏览网络时我发现 Tim De ...

    深度学习 硬件 GPU Caffe CUDA
    本文作者 Roelof Pieters 是瑞典皇家理工学院 Institute of Technology & Consultant for Graph-Technologies 研究深度学习的一位在读博士,他同时也运营着自己的面向客户的深度学习产品。对于写作这个系列文章的动机,他写道:「我已经习惯了在云上工作,并且还将继续在云上开发面向产品的系统/算法。但是在更面向研究的任务上,基于云的系统还存在一些缺陷,因为在研究时你要做的基本上就是尝试各种各样的算法和架构,并且需要快速改进和迭代。为了做到这一点,我决定自己使用 GPU 设计和打造自己的量身定制的深度学习系统。在这一些方面这比我想象的简单,但另一些方面却更困难。在接下来的文章中,我会和你分享我的『冒险之旅』,不关你是深度学习实践的新手还是老手,希望这都对你有用。」目前该系列文章已经更新了两篇,机器之心将其统一编译到了这篇文章中。

    第一部分:硬件平台搭建

    左图:正在构建中的系统。你可以看到用于水冷的塑料管穿过 Carbide Air 540 机箱上原本就有的孔洞。主板是竖直安装的。

    中图和右图:建造好的系统。注意可以从外面看到的储水器。还可以看到从上至下的红色塑料管:上连注水口,下接水泵,穿过安装在 GPU 上的散热器模块。还可以看到 CPU 上有一个类似的结构。

    DIY 或寻求帮助

    选项 A:DIY

    当然,如果你有时间和意愿自己动手打造所有的一切,这将成为你完全理解各个组件的工作方式以及哪些硬件可以很好适配的绝佳方法。另外,你也可能能更好地理解当组件出现故障时应该做什么并更轻松地修复它。

    选项 B:外界帮助

    另一种选择是寻找专业的公司预定零件并让他们帮助组装好整个系统。你要寻找的这类公司应该是定制游戏机电脑的公司,他们常常为游戏玩家打造定制化的系统。他们甚至有水冷系统的经验,尽管游戏机电脑通常只需要水冷 CPU,但他们会有很好用的工具套件。当然,为了安装全水冷系统,你需要将 GPU 外壳打开,将芯片暴露出来安装散热片,再装上水管、压缩机帽等等各种所需的组件。不过水冷也有麻烦的地方:一旦出现漏水,你的 GPU 和其它组件就会被毁坏。

    因为我觉得我不能将这些东西装在一起以及正确地安装水冷气系统,而且我还没有多少时间阅读操作手册,所以我选择了第二种方案:找了一个非常熟练的硬件打造商帮我组装了我的深度学习机器的第一个版本。

    第二部分:安装软件和库

    目录

    软件和库
    安装 CUDA
    测试 CUDA
    深度学习库

    软件和库

    现在,我们有了一台裸机,是时候安装软件了!网上已有有了一些好的博文指导安装深度学习工具和库。为了简单化,我临时把一些要旨放在一起。这篇个文章将帮助你安装英伟达 CUDA 驱动,以及我青睐的一些深度学习工具与库。此外,我也假设你已经在电脑上安装了 Ubuntu 14.04.3 作为操作系统。

    1.安装 CUDA

    让图像驱动程序能正常工作是一件很痛苦的事。我当时的问题是 Titan X GPU 只能得到 Nvidia 346 的支持,这些驱动不能在我特定的监控器下工作。经过一些 xconfig 改装,我终于让它能在高于 800×600 的分辨率下工作了,我使用了 Linux X64 (AMD64/EM64T) DISPLAY DRIVER 352.30 版本作为图像驱动。

    设置演示安装的是 CUDA 7.0,我选择安装最新的 CUDA 7.5。虽然该版本的确有所改进,但在一些库上也难以正常工作。如果你想快速启动并运行,可以尝试 7.0 版本。

    !/usr/bin/env bash
    # Installation script for Cuda and drivers on Ubuntu 14.04, by Roelof Pieters (@graphific)
    # BSD License
    if [ "$(whoami)" == "root" ]; then
      echo "running as root, please run as user you want to have stuff installed as"
      exit 1
    fi
    ###################################
    #   Ubuntu 14.04 Install script for:
    # - Nvidia graphic drivers for Titan X: 352
    # - Cuda 7.0 (7.5 gives "out of memory" issues)
    # - CuDNN3
    # - Theano (bleeding edge)
    # - Torch7
    # - ipython notebook (running as service with circus auto(re)boot on port 8888)
    # - itorch notebook (running as service with circus auto(re)boot on port 8889)
    # - Caffe 
    # - [OpenCV](http://www.dataguru.cn/article-9662-1.html?union_site=innerlink) 3.0 gold release (vs. 2015-06-04)
    # - Digits
    # - Lasagne
    # - Nolearn
    # - Keras
    ###################################
    
    # started with a bare ubuntu 14.04.3 LTS install, with only ubuntu-desktop installed
    # script will install the bare minimum, with all "extras" in a seperate venv
    
    export DEBIAN_FRONTEND=noninteractive
    
    sudo apt-get update -y
    sudo apt-get install -y git wget linux-image-generic build-essential unzip
    
    # manual driver install with:
    # sudo service lightdm stop
    # (login on non graphical terminal)
    # wget http://uk.download.nvidia.com/XFree86/Linux-x86_64/352.30/NVIDIA-Linux-x86_64-352.30.run
    # chmod +x ./NVIDIA-Linux-x86_64-352.30.run
    # sudo ./NVIDIA-Linux-x86_64-352.30.run
    
    # Cuda 7.0
    # instead we install the nvidia driver 352 from the cuda repo
    # which makes it easier than stopping lightdm and installing in terminal
    cd /tmp
    wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb
    
    echo -e "\nexport CUDA_HOME=/usr/local/cuda\nexport CUDA_ROOT=/usr/local/cuda" >> ~/.bashrc
    echo -e "\nexport PATH=/usr/local/cuda/bin:\$PATH\nexport LD_LIBRARY_PATH=/usr/local/cuda/lib64:\$LD_LIBRARY_PATH" >> ~/.bashrc
    
    echo "CUDA installation complete: please reboot your machine and continue with script #2"
    

    2. 测试 CUDA

    完成安装了?很好,接下来让我们看一下 CUDA 驱动是否能够正常工作。直接进入 CUDA 样本目录,运行 ./deviceQuery。你的 GPU 应该会被显示如下:

    #!/usr/bin/env bash
    # Test script for checking if Cuda and Drivers correctly installed on Ubuntu 14.04, by Roelof Pieters (@graphific)
    # BSD License
    
    if [ "$(whoami)" == "root" ]; then
      echo "running as root, please run as user you want to have stuff installed as"
      exit 1
    fi
    ###################################
    #   Ubuntu 14.04 Install script for:
    # - Nvidia graphic drivers for Titan X: 352
    # - Cuda 7.0 (7.5 gives "out of memory" issues)
    # - CuDNN3
    # - Theano (bleeding edge)
    # - Torch7
    # - ipython notebook (running as service with circus auto(re)boot on port 8888)
    # - itorch notebook (running as service with circus auto(re)boot on port 8889)
    # - Caffe 
    # - OpenCV 3.0 gold release (vs. 2015-06-04)
    # - Digits
    # - Lasagne
    # - Nolearn
    # - Keras
    ###################################
    
    # started with a bare ubuntu 14.04.3 LTS install, with only ubuntu-desktop installed
    # script will install the bare minimum, with all "extras" in a seperate venv
    
    export DEBIAN_FRONTEND=noninteractive
    
    # Checking cuda installation
    # installing the samples and checking the GPU
    cuda-install-samples-7.0.sh ~/
    cd NVIDIA\_CUDA-7.0\_Samples/1\_Utilities/deviceQuery  
    make  
    
    #Samples installed and GPU(s) Found ?
    ./deviceQuery  | grep "Result = PASS"
    greprc=$?
    if [[ $greprc -eq 0 ]] ; then
        echo "Cuda Samples installed and GPU found"
        echo "you can also check usage and temperature of gpus with nvidia-smi"
    else
        if [[ $greprc -eq 1 ]] ; then
            echo "Cuda Samples not installed, exiting..."
            exit 1
        else
            echo "Some sort of error, exiting..."
            exit 1
        fi
    fi
    
    echo "now would be time to install cudnn for a speedup"
    echo "unfortunately only available by registering on nvidias website:"
    echo "https://developer.nvidia.com/cudnn"
    echo "deep learning libraries can be installed with final script #3"
    

    3. 深度学习库

    好了,来到最后一步,它也是很有趣的一部分:选择个人偏好的深度学习库,这也是由所在领域所决定的。

    作为研究人员,Theano 能给你最大的自由度,做自己想做的事。你可以自己部署许多事,也因此更能深度理解 DNN 如何工作。但对想首先尝试下的初学者来说可能不合适。

    我个人是 Keras(主要贡献者:François Chollet,已经加入了谷歌)和 Lasagne(8 个人的团队,但主要贡献者是 Sander Dielemans,近期读完了博士,如今加入了谷歌 DeepMind)的粉丝。这两个库有很好的抽象水平,也被积极的开发,也提供插入自己模块或代码工程的简单方式。

    如果你习惯 Python,那使用 Torch 会具有挑战性,因为你需要学习 Lua。在使用 Torch 一段时间之后,我可以说它是一个很好使用的语言。唯一一个问题是从其他语言接入到 Lua 很难。对研究目的,Torch 表现也很好。但对生产水平管道而言,Torch 难以进行测试,而且看起来完全缺乏任何类型的错误处理。Torch 积极的一面有:支持 CUDA,有很多可以使用的 程序包。Torch 看起来也是产业内使用最普遍的库。Facebook(Ronan Collobert & Soumith Chintala)、DeepMind(Koray Kavukçuoğlu)、Twitter(Clement Farabet)的这些人都是主要贡献者。

    Caffe 是之前占据主导地位的深度学习框架(主要用于 Convnets),如今仍在被普遍使用,也是一个可以作为开始的很好的框架。训练制度(solver.prototxt)与架构(train val.prototxt)文档之间的分离使得实验更容易进行。我发现 Caffe 也是唯一一个支持使用电脑外多 GPU 的框架,你可以穿过 GPU 或 GPU id 参数使用所有可用的 GPU。

    Blocks 是最近的一款基于 Python 的框架,很好的分离了自己编写的模块与被称为 Brick 的模块。特别是其 partner「Fuel」,是一个处理数据的很好方式。Fuel 是一个对许多已有的或你自己的数据集的 wrapper。它利用「iteration schemes」将数据导流到模型中,并可以「transformers」所有类型的数据转换和预处理步骤。

    Neon 是 Nervana System 公司基于 Python 的深度学习框架,建立在 Nervana 的 GPU Kernel(对英伟达 CuDNN 的替代)之上。Neon 是运行该特殊 Kernel 的唯一框架,最新的基准测试显示在一些特定任务上它是最快的。

    展示深度学习库(面向 Python)的另一种方式:从更低层次的 DIY 到更高层次的、更功能性的框架。

    准备好了吗?下面的脚本将安装 Theano、Torch、Caffe、Digits、Lasange、Keras。我们之前用过 Digits,但它是一个建立在 Caffe 之上的图形网页接口。这相当的基础,但如果你刚开始的话,训练一些 ConvNets 以及建立一些图形分类器会是很简单的方法。

    #!/usr/bin/env bash
    # Installation script for Deep Learning Libraries on Ubuntu 14.04, by Roelof Pieters (@graphific)
    # BSD License
    
    orig_executor="$(whoami)"
    if [ "$(whoami)" == "root" ]; then
      echo "running as root, please run as user you want to have stuff installed as"
      exit 1
    fi
    ###################################
    #   Ubuntu 14.04 Install script for:
    # - Nvidia graphic drivers for Titan X: 352
    # - Cuda 7.0 (7.5 gives "out of memory" issues)
    # - CuDNN3
    # - Theano (bleeding edge)
    # - Torch7
    # - ipython notebook (running as service with circus auto(re)boot on port 8888)
    # - itorch notebook (running as service with circus auto(re)boot on port 8889)
    # - Caffe 
    # - OpenCV 3.0 gold release (vs. 2015-06-04)
    # - Digits
    # - Lasagne
    # - Nolearn
    # - Keras
    ###################################
    
    export DEBIAN_FRONTEND=noninteractive
    sudo apt-get install -y libncurses-dev
    
    # next part copied from (check there for newest version): 
    # https://github.com/deeplearningparis/dl-machine/blob/master/scripts/install-deeplearning-libraries.sh
    
    ####################################
    # Dependencies
    ####################################
    
    # Build latest stable release of OpenBLAS without OPENMP to make it possible
    # to use Python multiprocessing and forks without crash
    # The torch install script will install OpenBLAS with OPENMP enabled in
    # /opt/OpenBLAS so we need to install the OpenBLAS used by Python in a
    # distinct folder.
    # Note: the master branch only has the release tags in it
    sudo apt-get install -y gfortran
    export OPENBLAS_ROOT=/opt/OpenBLAS-no-openmp
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OPENBLAS_ROOT/lib
    if [ ! -d "OpenBLAS" ]; then
        git clone -q --branch=master git://github.com/xianyi/OpenBLAS.git
        (cd OpenBLAS \
          && make FC=gfortran USE_OPENMP=0 NO_AFFINITY=1 NUM_THREADS=$(nproc) \
          && sudo make install PREFIX=$OPENBLAS_ROOT)
        echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.bashrc
    fi
    sudo ldconfig
    
    # Python basics: update pip and setup a virtualenv to avoid mixing packages
    # installed from source with system packages
    sudo apt-get update -y 
    sudo apt-get install -y python-dev python-pip htop
    sudo pip install -U pip virtualenv
    if [ ! -d "venv" ]; then
        virtualenv venv
        echo "source ~/venv/bin/activate" >> ~/.bashrc
    fi
    source venv/bin/activate
    pip install -U pip
    pip install -U circus circus-web Cython Pillow
    
    # Checkout this project to access installation script and additional resources
    if [ ! -d "dl-machine" ]; then
        git clone git@github.com:deeplearningparis/dl-machine.git
        (cd dl-machine && git remote add http https://github.com/deeplearningparis/dl-machine.git)
    else
        if  [ "$1" == "reset" ]; then
            (cd dl-machine && git reset --hard && git checkout master && git pull --rebase $REMOTE master)
        fi
    fi
    
    # Build numpy from source against OpenBLAS
    # You might need to install liblapack-dev package as well
    # sudo apt-get install -y liblapack-dev
    rm -f ~/.numpy-site.cfg
    ln -s dl-machine/numpy-site.cfg ~/.numpy-site.cfg
    pip install -U numpy
    
    # Build scipy from source against OpenBLAS
    rm -f ~/.scipy-site.cfg
    ln -s dl-machine/scipy-site.cfg ~/.scipy-site.cfg
    pip install -U scipy
    
    # Install common tools from the scipy stack
    sudo apt-get install -y libfreetype6-dev libpng12-dev
    pip install -U matplotlib ipython[all] pandas scikit-image
    
    # Scikit-learn (generic machine learning utilities)
    pip install -e git+git://github.com/scikit-learn/scikit-learn.git#egg=scikit-learn
    
    ####################################
    # OPENCV 3
    ####################################
    # from http://rodrigoberriel.com/2014/10/installing-opencv-3-0-0-on-ubuntu-14-04/
    # for 2.9 see http://www.samontab.com/web/2014/06/installing-opencv-2-4-9-in-ubuntu-14-04-lts/ 
    cd ~/
    sudo apt-get -y install libopencv-dev build-essential cmake git libgtk2.0-dev \
       pkg-config python-dev python-numpy libdc1394-22 libdc1394-22-dev libjpeg-dev \
       libpng12-dev libtiff4-dev libjasper-dev libavcodec-dev libavformat-dev \
       libswscale-dev libxine-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev \
       libv4l-dev libtbb-dev libqt4-dev libfaac-dev libmp3lame-dev libopencore-amrnb-dev \
       libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev x264 v4l-utils unzip
    
    wget https://github.com/Itseez/opencv/archive/3.0.0.tar.gz -O opencv-3.0.0.tar.gz
    tar -zxvf  opencv-3.0.0.tar.gz
    
    cd opencv-3.0.0
    mkdir build
    cd build
    
    cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D WITH_QT=ON -D WITH_OPENGL=ON ..
    make -j $(nproc)
    sudo make install
    
    sudo /bin/bash -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf'
    sudo ldconfig
    ln -s /usr/lib/python2.7/dist-packages/cv2.so /home/$orig_executor/venv/lib/python2.7/site-packages/cv2.so
    
    echo "opencv 3.0 installed"
    
    ####################################
    # Theano
    ####################################
    # installing theano
    # By default, Theano will detect if it can use cuDNN. If so, it will use it. 
    # To get an error if Theano can not use cuDNN, use this Theano flag: optimizer_including=cudnn.
    
    pip install -e git+git://github.com/Theano/Theano.git#egg=Theano
    if [ ! -f ".theanorc" ]; then
        ln -s ~/dl-machine/theanorc ~/.theanorc
    fi
    
    echo "Installed Theano"
    
    # Tutorial files
    if [ ! -d "DL4H" ]; then
        git clone git@github.com:SnippyHolloW/DL4H.git
        (cd DL4H && git remote add http https://github.com/SnippyHolloW/DL4H.git)
    else
        if  [ "$1" == "reset" ]; then
            (cd DL4H && git reset --hard && git checkout master && git pull --rebase $REMOTE master)
        fi
    fi
    
    ####################################
    # Torch
    ####################################
    
    if [ ! -d "torch" ]; then
        curl -sk https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
        git clone https://github.com/torch/distro.git ~/torch --recursive
        (cd ~/torch && yes | ./install.sh)
    fi
    . ~/torch/install/bin/torch-activate
    
    if [ ! -d "iTorch" ]; then
        git clone git@github.com:facebook/iTorch.git
        (cd iTorch && git remote add http https://github.com/facebook/iTorch.git)
    else
        if  [ "$1" == "reset" ]; then
            (cd iTorch && git reset --hard && git checkout master && git pull --rebase $REMOTE master)
        fi
    fi
    (cd iTorch && luarocks make)
    
    cd ~/
    git clone https://github.com/torch/demos.git torch-demos
    
    #qt dependency
    sudo apt-get install -y qt4-dev-tools libqt4-dev libqt4-core libqt4-gui
    
    #main luarocks libs:
    luarocks install image    # an image library for Torch7
    luarocks install nnx      # lots of extra neural-net modules
    luarocks install unup
    
    echo "Installed Torch (demos in $HOME/torch-demos)"
    
    # Register the circus daemon with Upstart
    if [ ! -f "/etc/init/circus.conf" ]; then
        sudo ln -s $HOME/dl-machine/circus.conf /etc/init/circus.conf
        sudo initctl reload-configuration
    fi
    sudo service circus restart
    
    cd ~/
    
    ## Next part ...
    ####################################
    # Caffe
    ####################################
    
    sudo apt-get install -y libprotobuf-dev libleveldb-dev \
      libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev \
      libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler \
      libatlas-base-dev libyaml-dev 
      
    git clone https://github.com/BVLC/caffe.git
    cd caffe
    for req in $(cat python/requirements.txt); do pip install $req -U; done
    
    make all
    make pycaffe
    
    cd python
    pip install networkx -U
    pip install pillow -U
    pip install -r requirements.txt
    
    ln -s ~/caffe/python/caffe ~/venv/lib/python2.7/site-packages/caffe
    echo -e "\nexport CAFFE_HOME=/home/$orig_executor/caffe" >> ~/.bashrc
    
    echo "Installed Caffe"
    
    ####################################
    # Digits
    ####################################
    
    # Nvidia Digits needs a specific version of caffe
    # so you can install the venv version by Nvidia uif you register
    # with cudnn, cuda, and caffe already packaged
    # instead we will install from scratch
    cd ~/
    
    git clone https://github.com/NVIDIA/DIGITS.git digits
    
    cd digits
    pip install -r requirements.txt
    
    sudo apt-get install graphviz
    
    echo "digits installed, run with ./digits-devserver or     ./digits-server"
    
    ####################################
    # Lasagne
    # https://github.com/Lasagne/Lasagne
    ####################################
    git clone https://github.com/Lasagne/Lasagne.git
    cd Lasagne
    python setup.py install
    
    echo "Lasagne installed"
    
    ####################################
    # Nolearn
    # asbtractions, mainly around Lasagne
    # https://github.com/dnouri/nolearn
    ####################################
    git clone https://github.com/dnouri/nolearn
    cd nolearn
    pip install -r requirements.txt
    python setup.py install
    
    echo "nolearn wrapper installed"
    
    ####################################
    # Keras
    # https://github.com/fchollet/keras
    # http://keras.io/
    ####################################
    git clone https://github.com/fchollet/keras.git
    cd keras
    python setup.py install
    
    echo "Keras installed"
    
    echo "all done, please restart your machine..."
    
    #   possible issues & fixes:
    # - skimage: issue with "not finding jpeg decoder?" 
    # "PIL: IOError: decoder zip not available"
    # (https://github.com/python-pillow/Pillow/issues/174)
    # sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev \
    #     libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk
    # next try:
    # pip uninstall pillow
    # git clone https://github.com/python-pillow/Pillow.git
    # cd Pillow 
    # python setup.py install
    

    原文链接:http://graphific.github.io/posts/building-a-deep-learning-dream-machine/

    **欢迎加入本站公开兴趣群
    **
    软件开发技术群
    兴趣范围包括:Java,C/C++,Python,PHP,Ruby,shell等各种语言开发经验交流,各种框架使用,外包项目机会,学习、培训、跳槽等交流
    QQ群:26931708

    Hadoop源代码研究群
    兴趣范围包括:Hadoop源代码解读,改进,优化,分布式系统场景定制,与Hadoop有关的各种开源项目,总之就是玩转Hadoop
    QQ群:288410967

    相关文章

      网友评论

        本文标题:从硬件配置到软件安装,一台深度学习机器的配备指南

        本文链接:https://www.haomeiwen.com/subject/prwscxtx.html