从硬件配置到软件安装，一台深度学习机器的配备指南

作者: RocWay | 来源:发表于2017-06-27 10:15 被阅读0次

从硬件配置到软件安装，一台深度学习机器的配备指南
机器学习之CUDA安装
深度学习服务器搭建笔记——从硬件选择到环境安装
深度学习主机环境搭建cuda9.0+cuDNN7.1.1
04-k8s集群安装-前置环境
kubeadmin部署k8s集群
远程安装软件(1)
Ai-笔记：24小时的Industry Standards
机器学习-python环境配置
深度学习硬件指南

转发自炼数成金

摘要: 一旦我决定了搭建我自己的 GPU 系统时，我首先想到的是：为什么要这么麻烦自己去搭建一个呢，英伟达不是刚发布了其强大的 DevBox 吗，而且还可能有其它供应商也在为深度学习应用做同样的事？浏览网络时我发现 Tim De ...

深度学习

硬件

GPU

Caffe

CUDA
本文作者 Roelof Pieters 是瑞典皇家理工学院 Institute of Technology & Consultant for Graph-Technologies 研究深度学习的一位在读博士，他同时也运营着自己的面向客户的深度学习产品。对于写作这个系列文章的动机，他写道：「我已经习惯了在云上工作，并且还将继续在云上开发面向产品的系统/算法。但是在更面向研究的任务上，基于云的系统还存在一些缺陷，因为在研究时你要做的基本上就是尝试各种各样的算法和架构，并且需要快速改进和迭代。为了做到这一点，我决定自己使用 GPU 设计和打造自己的量身定制的深度学习系统。在这一些方面这比我想象的简单，但另一些方面却更困难。在接下来的文章中，我会和你分享我的『冒险之旅』，不关你是深度学习实践的新手还是老手，希望这都对你有用。」目前该系列文章已经更新了两篇，机器之心将其统一编译到了这篇文章中。

第一部分：硬件平台搭建

左图：正在构建中的系统。你可以看到用于水冷的塑料管穿过 Carbide Air 540 机箱上原本就有的孔洞。主板是竖直安装的。

中图和右图：建造好的系统。注意可以从外面看到的储水器。还可以看到从上至下的红色塑料管：上连注水口，下接水泵，穿过安装在 GPU 上的散热器模块。还可以看到 CPU 上有一个类似的结构。

DIY 或寻求帮助

选项 A：DIY

当然，如果你有时间和意愿自己动手打造所有的一切，这将成为你完全理解各个组件的工作方式以及哪些硬件可以很好适配的绝佳方法。另外，你也可能能更好地理解当组件出现故障时应该做什么并更轻松地修复它。

选项 B：外界帮助

另一种选择是寻找专业的公司预定零件并让他们帮助组装好整个系统。你要寻找的这类公司应该是定制游戏机电脑的公司，他们常常为游戏玩家打造定制化的系统。他们甚至有水冷系统的经验，尽管游戏机电脑通常只需要水冷 CPU，但他们会有很好用的工具套件。当然，为了安装全水冷系统，你需要将 GPU 外壳打开，将芯片暴露出来安装散热片，再装上水管、压缩机帽等等各种所需的组件。不过水冷也有麻烦的地方：一旦出现漏水，你的 GPU 和其它组件就会被毁坏。

因为我觉得我不能将这些东西装在一起以及正确地安装水冷气系统，而且我还没有多少时间阅读操作手册，所以我选择了第二种方案：找了一个非常熟练的硬件打造商帮我组装了我的深度学习机器的第一个版本。

第二部分：安装软件和库

软件和库
安装 CUDA
测试 CUDA
深度学习库

软件和库

现在，我们有了一台裸机，是时候安装软件了！网上已有有了一些好的博文指导安装深度学习工具和库。为了简单化，我临时把一些要旨放在一起。这篇个文章将帮助你安装英伟达 CUDA 驱动，以及我青睐的一些深度学习工具与库。此外，我也假设你已经在电脑上安装了 Ubuntu 14.04.3 作为操作系统。

1.安装 CUDA

让图像驱动程序能正常工作是一件很痛苦的事。我当时的问题是 Titan X GPU 只能得到 Nvidia 346 的支持，这些驱动不能在我特定的监控器下工作。经过一些 xconfig 改装，我终于让它能在高于 800×600 的分辨率下工作了，我使用了 Linux X64 (AMD64/EM64T) DISPLAY DRIVER 352.30 版本作为图像驱动。

设置演示安装的是 CUDA 7.0，我选择安装最新的 CUDA 7.5。虽然该版本的确有所改进，但在一些库上也难以正常工作。如果你想快速启动并运行，可以尝试 7.0 版本。

!/usr/bin/env bash
# Installation script for Cuda and drivers on Ubuntu 14.04, by Roelof Pieters (@graphific)
# BSD License
if [ "$(whoami)" == "root" ]; then
  echo "running as root, please run as user you want to have stuff installed as"
  exit 1
fi
###################################
#   Ubuntu 14.04 Install script for:
# - Nvidia graphic drivers for Titan X: 352
# - Cuda 7.0 (7.5 gives "out of memory" issues)
# - CuDNN3
# - Theano (bleeding edge)
# - Torch7
# - ipython notebook (running as service with circus auto(re)boot on port 8888)
# - itorch notebook (running as service with circus auto(re)boot on port 8889)
# - Caffe 
# - [OpenCV](http://www.dataguru.cn/article-9662-1.html?union_site=innerlink) 3.0 gold release (vs. 2015-06-04)
# - Digits
# - Lasagne
# - Nolearn
# - Keras
###################################

# started with a bare ubuntu 14.04.3 LTS install, with only ubuntu-desktop installed
# script will install the bare minimum, with all "extras" in a seperate venv

export DEBIAN_FRONTEND=noninteractive

sudo apt-get update -y
sudo apt-get install -y git wget linux-image-generic build-essential unzip

# manual driver install with:
# sudo service lightdm stop
# (login on non graphical terminal)
# wget http://uk.download.nvidia.com/XFree86/Linux-x86_64/352.30/NVIDIA-Linux-x86_64-352.30.run
# chmod +x ./NVIDIA-Linux-x86_64-352.30.run
# sudo ./NVIDIA-Linux-x86_64-352.30.run

# Cuda 7.0
# instead we install the nvidia driver 352 from the cuda repo
# which makes it easier than stopping lightdm and installing in terminal
cd /tmp
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb

echo -e "\nexport CUDA_HOME=/usr/local/cuda\nexport CUDA_ROOT=/usr/local/cuda" >> ~/.bashrc
echo -e "\nexport PATH=/usr/local/cuda/bin:\$PATH\nexport LD_LIBRARY_PATH=/usr/local/cuda/lib64:\$LD_LIBRARY_PATH" >> ~/.bashrc

echo "CUDA installation complete: please reboot your machine and continue with script #2"

2. 测试 CUDA

完成安装了？很好，接下来让我们看一下 CUDA 驱动是否能够正常工作。直接进入 CUDA 样本目录，运行 ./deviceQuery。你的 GPU 应该会被显示如下：

#!/usr/bin/env bash
# Test script for checking if Cuda and Drivers correctly installed on Ubuntu 14.04, by Roelof Pieters (@graphific)
# BSD License

if [ "$(whoami)" == "root" ]; then
  echo "running as root, please run as user you want to have stuff installed as"
  exit 1
fi
###################################
#   Ubuntu 14.04 Install script for:
# - Nvidia graphic drivers for Titan X: 352
# - Cuda 7.0 (7.5 gives "out of memory" issues)
# - CuDNN3
# - Theano (bleeding edge)
# - Torch7
# - ipython notebook (running as service with circus auto(re)boot on port 8888)
# - itorch notebook (running as service with circus auto(re)boot on port 8889)
# - Caffe 
# - OpenCV 3.0 gold release (vs. 2015-06-04)
# - Digits
# - Lasagne
# - Nolearn
# - Keras
###################################

# started with a bare ubuntu 14.04.3 LTS install, with only ubuntu-desktop installed
# script will install the bare minimum, with all "extras" in a seperate venv

export DEBIAN_FRONTEND=noninteractive

# Checking cuda installation
# installing the samples and checking the GPU
cuda-install-samples-7.0.sh ~/
cd NVIDIA\_CUDA-7.0\_Samples/1\_Utilities/deviceQuery  
make  

#Samples installed and GPU(s) Found ?
./deviceQuery  | grep "Result = PASS"
greprc=$?
if [[ $greprc -eq 0 ]] ; then
    echo "Cuda Samples installed and GPU found"
    echo "you can also check usage and temperature of gpus with nvidia-smi"
else
    if [[ $greprc -eq 1 ]] ; then
        echo "Cuda Samples not installed, exiting..."
        exit 1
    else
        echo "Some sort of error, exiting..."
        exit 1
    fi
fi

echo "now would be time to install cudnn for a speedup"
echo "unfortunately only available by registering on nvidias website:"
echo "https://developer.nvidia.com/cudnn"
echo "deep learning libraries can be installed with final script #3"

3. 深度学习库

好了，来到最后一步，它也是很有趣的一部分：选择个人偏好的深度学习库，这也是由所在领域所决定的。

作为研究人员，Theano 能给你最大的自由度，做自己想做的事。你可以自己部署许多事，也因此更能深度理解 DNN 如何工作。但对想首先尝试下的初学者来说可能不合适。

我个人是 Keras（主要贡献者：François Chollet，已经加入了谷歌）和 Lasagne（8 个人的团队，但主要贡献者是 Sander Dielemans，近期读完了博士，如今加入了谷歌 DeepMind）的粉丝。这两个库有很好的抽象水平，也被积极的开发，也提供插入自己模块或代码工程的简单方式。

如果你习惯 Python，那使用 Torch 会具有挑战性，因为你需要学习 Lua。在使用 Torch 一段时间之后，我可以说它是一个很好使用的语言。唯一一个问题是从其他语言接入到 Lua 很难。对研究目的，Torch 表现也很好。但对生产水平管道而言，Torch 难以进行测试，而且看起来完全缺乏任何类型的错误处理。Torch 积极的一面有：支持 CUDA，有很多可以使用的程序包。Torch 看起来也是产业内使用最普遍的库。Facebook（Ronan Collobert & Soumith Chintala）、DeepMind（Koray Kavukçuoğlu）、Twitter（Clement Farabet）的这些人都是主要贡献者。

Caffe 是之前占据主导地位的深度学习框架（主要用于 Convnets），如今仍在被普遍使用，也是一个可以作为开始的很好的框架。训练制度（solver.prototxt）与架构（train val.prototxt）文档之间的分离使得实验更容易进行。我发现 Caffe 也是唯一一个支持使用电脑外多 GPU 的框架，你可以穿过 GPU 或 GPU id 参数使用所有可用的 GPU。

Blocks 是最近的一款基于 Python 的框架，很好的分离了自己编写的模块与被称为 Brick 的模块。特别是其 partner「Fuel」，是一个处理数据的很好方式。Fuel 是一个对许多已有的或你自己的数据集的 wrapper。它利用「iteration schemes」将数据导流到模型中，并可以「transformers」所有类型的数据转换和预处理步骤。

Neon 是 Nervana System 公司基于 Python 的深度学习框架，建立在 Nervana 的 GPU Kernel（对英伟达 CuDNN 的替代）之上。Neon 是运行该特殊 Kernel 的唯一框架，最新的基准测试显示在一些特定任务上它是最快的。

展示深度学习库（面向 Python）的另一种方式：从更低层次的 DIY 到更高层次的、更功能性的框架。

准备好了吗？下面的脚本将安装 Theano、Torch、Caffe、Digits、Lasange、Keras。我们之前用过 Digits，但它是一个建立在 Caffe 之上的图形网页接口。这相当的基础，但如果你刚开始的话，训练一些 ConvNets 以及建立一些图形分类器会是很简单的方法。

#!/usr/bin/env bash
# Installation script for Deep Learning Libraries on Ubuntu 14.04, by Roelof Pieters (@graphific)
# BSD License

orig_executor="$(whoami)"
if [ "$(whoami)" == "root" ]; then
  echo "running as root, please run as user you want to have stuff installed as"
  exit 1
fi
###################################
#   Ubuntu 14.04 Install script for:
# - Nvidia graphic drivers for Titan X: 352
# - Cuda 7.0 (7.5 gives "out of memory" issues)
# - CuDNN3
# - Theano (bleeding edge)
# - Torch7
# - ipython notebook (running as service with circus auto(re)boot on port 8888)
# - itorch notebook (running as service with circus auto(re)boot on port 8889)
# - Caffe 
# - OpenCV 3.0 gold release (vs. 2015-06-04)
# - Digits
# - Lasagne
# - Nolearn
# - Keras
###################################

export DEBIAN_FRONTEND=noninteractive
sudo apt-get install -y libncurses-dev

# next part copied from (check there for newest version): 
# https://github.com/deeplearningparis/dl-machine/blob/master/scripts/install-deeplearning-libraries.sh

####################################
# Dependencies
####################################

# Build latest stable release of OpenBLAS without OPENMP to make it possible
# to use Python multiprocessing and forks without crash
# The torch install script will install OpenBLAS with OPENMP enabled in
# /opt/OpenBLAS so we need to install the OpenBLAS used by Python in a
# distinct folder.
# Note: the master branch only has the release tags in it
sudo apt-get install -y gfortran
export OPENBLAS_ROOT=/opt/OpenBLAS-no-openmp
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OPENBLAS_ROOT/lib
if [ ! -d "OpenBLAS" ]; then
    git clone -q --branch=master git://github.com/xianyi/OpenBLAS.git
    (cd OpenBLAS \
      && make FC=gfortran USE_OPENMP=0 NO_AFFINITY=1 NUM_THREADS=$(nproc) \
      && sudo make install PREFIX=$OPENBLAS_ROOT)
    echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.bashrc
fi
sudo ldconfig

# Python basics: update pip and setup a virtualenv to avoid mixing packages
# installed from source with system packages
sudo apt-get update -y 
sudo apt-get install -y python-dev python-pip htop
sudo pip install -U pip virtualenv
if [ ! -d "venv" ]; then
    virtualenv venv
    echo "source ~/venv/bin/activate" >> ~/.bashrc
fi
source venv/bin/activate
pip install -U pip
pip install -U circus circus-web Cython Pillow

# Checkout this project to access installation script and additional resources
if [ ! -d "dl-machine" ]; then
    git clone git@github.com:deeplearningparis/dl-machine.git
    (cd dl-machine && git remote add http https://github.com/deeplearningparis/dl-machine.git)
else
    if  [ "$1" == "reset" ]; then
        (cd dl-machine && git reset --hard && git checkout master && git pull --rebase $REMOTE master)
    fi
fi

# Build numpy from source against OpenBLAS
# You might need to install liblapack-dev package as well
# sudo apt-get install -y liblapack-dev
rm -f ~/.numpy-site.cfg
ln -s dl-machine/numpy-site.cfg ~/.numpy-site.cfg
pip install -U numpy

# Build scipy from source against OpenBLAS
rm -f ~/.scipy-site.cfg
ln -s dl-machine/scipy-site.cfg ~/.scipy-site.cfg
pip install -U scipy

# Install common tools from the scipy stack
sudo apt-get install -y libfreetype6-dev libpng12-dev
pip install -U matplotlib ipython[all] pandas scikit-image

# Scikit-learn (generic machine learning utilities)
pip install -e git+git://github.com/scikit-learn/scikit-learn.git#egg=scikit-learn

####################################
# OPENCV 3
####################################
# from http://rodrigoberriel.com/2014/10/installing-opencv-3-0-0-on-ubuntu-14-04/
# for 2.9 see http://www.samontab.com/web/2014/06/installing-opencv-2-4-9-in-ubuntu-14-04-lts/ 
cd ~/
sudo apt-get -y install libopencv-dev build-essential cmake git libgtk2.0-dev \
   pkg-config python-dev python-numpy libdc1394-22 libdc1394-22-dev libjpeg-dev \
   libpng12-dev libtiff4-dev libjasper-dev libavcodec-dev libavformat-dev \
   libswscale-dev libxine-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev \
   libv4l-dev libtbb-dev libqt4-dev libfaac-dev libmp3lame-dev libopencore-amrnb-dev \
   libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev x264 v4l-utils unzip

wget https://github.com/Itseez/opencv/archive/3.0.0.tar.gz -O opencv-3.0.0.tar.gz
tar -zxvf  opencv-3.0.0.tar.gz

cd opencv-3.0.0
mkdir build
cd build

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D WITH_V4L=ON -D WITH_QT=ON -D WITH_OPENGL=ON ..
make -j $(nproc)
sudo make install

sudo /bin/bash -c 'echo "/usr/local/lib" > /etc/ld.so.conf.d/opencv.conf'
sudo ldconfig
ln -s /usr/lib/python2.7/dist-packages/cv2.so /home/$orig_executor/venv/lib/python2.7/site-packages/cv2.so

echo "opencv 3.0 installed"

####################################
# Theano
####################################
# installing theano
# By default, Theano will detect if it can use cuDNN. If so, it will use it. 
# To get an error if Theano can not use cuDNN, use this Theano flag: optimizer_including=cudnn.

pip install -e git+git://github.com/Theano/Theano.git#egg=Theano
if [ ! -f ".theanorc" ]; then
    ln -s ~/dl-machine/theanorc ~/.theanorc
fi

echo "Installed Theano"

# Tutorial files
if [ ! -d "DL4H" ]; then
    git clone git@github.com:SnippyHolloW/DL4H.git
    (cd DL4H && git remote add http https://github.com/SnippyHolloW/DL4H.git)
else
    if  [ "$1" == "reset" ]; then
        (cd DL4H && git reset --hard && git checkout master && git pull --rebase $REMOTE master)
    fi
fi

####################################
# Torch
####################################

if [ ! -d "torch" ]; then
    curl -sk https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
    git clone https://github.com/torch/distro.git ~/torch --recursive
    (cd ~/torch && yes | ./install.sh)
fi
. ~/torch/install/bin/torch-activate

if [ ! -d "iTorch" ]; then
    git clone git@github.com:facebook/iTorch.git
    (cd iTorch && git remote add http https://github.com/facebook/iTorch.git)
else
    if  [ "$1" == "reset" ]; then
        (cd iTorch && git reset --hard && git checkout master && git pull --rebase $REMOTE master)
    fi
fi
(cd iTorch && luarocks make)

cd ~/
git clone https://github.com/torch/demos.git torch-demos

#qt dependency
sudo apt-get install -y qt4-dev-tools libqt4-dev libqt4-core libqt4-gui

#main luarocks libs:
luarocks install image    # an image library for Torch7
luarocks install nnx      # lots of extra neural-net modules
luarocks install unup

echo "Installed Torch (demos in $HOME/torch-demos)"

# Register the circus daemon with Upstart
if [ ! -f "/etc/init/circus.conf" ]; then
    sudo ln -s $HOME/dl-machine/circus.conf /etc/init/circus.conf
    sudo initctl reload-configuration
fi
sudo service circus restart

cd ~/

## Next part ...
####################################
# Caffe
####################################

sudo apt-get install -y libprotobuf-dev libleveldb-dev \
  libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev \
  libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler \
  libatlas-base-dev libyaml-dev 
  
git clone https://github.com/BVLC/caffe.git
cd caffe
for req in $(cat python/requirements.txt); do pip install $req -U; done

make all
make pycaffe

cd python
pip install networkx -U
pip install pillow -U
pip install -r requirements.txt

ln -s ~/caffe/python/caffe ~/venv/lib/python2.7/site-packages/caffe
echo -e "\nexport CAFFE_HOME=/home/$orig_executor/caffe" >> ~/.bashrc

echo "Installed Caffe"

####################################
# Digits
####################################

# Nvidia Digits needs a specific version of caffe
# so you can install the venv version by Nvidia uif you register
# with cudnn, cuda, and caffe already packaged
# instead we will install from scratch
cd ~/

git clone https://github.com/NVIDIA/DIGITS.git digits

cd digits
pip install -r requirements.txt

sudo apt-get install graphviz

echo "digits installed, run with ./digits-devserver or     ./digits-server"

####################################
# Lasagne
# https://github.com/Lasagne/Lasagne
####################################
git clone https://github.com/Lasagne/Lasagne.git
cd Lasagne
python setup.py install

echo "Lasagne installed"

####################################
# Nolearn
# asbtractions, mainly around Lasagne
# https://github.com/dnouri/nolearn
####################################
git clone https://github.com/dnouri/nolearn
cd nolearn
pip install -r requirements.txt
python setup.py install

echo "nolearn wrapper installed"

####################################
# Keras
# https://github.com/fchollet/keras
# http://keras.io/
####################################
git clone https://github.com/fchollet/keras.git
cd keras
python setup.py install

echo "Keras installed"

echo "all done, please restart your machine..."

#   possible issues & fixes:
# - skimage: issue with "not finding jpeg decoder?" 
# "PIL: IOError: decoder zip not available"
# (https://github.com/python-pillow/Pillow/issues/174)
# sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev \
#     libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk
# next try:
# pip uninstall pillow
# git clone https://github.com/python-pillow/Pillow.git
# cd Pillow 
# python setup.py install

原文链接：http://graphific.github.io/posts/building-a-deep-learning-dream-machine/

**欢迎加入本站公开兴趣群
**
软件开发技术群
兴趣范围包括：Java，C/C++，Python，PHP，Ruby，shell等各种语言开发经验交流，各种框架使用，外包项目机会，学习、培训、跳槽等交流
QQ群：26931708

Hadoop源代码研究群
兴趣范围包括：Hadoop源代码解读，改进，优化，分布式系统场景定制，与Hadoop有关的各种开源项目，总之就是玩转Hadoop
QQ群：288410967

从硬件配置到软件安装，一台深度学习机器的配备指南
转发自炼数成金从硬件配置到软件安装，一台深度学习机器的配备指南2016-9-23 12:23| 发布者: 炼数成...
机器学习之CUDA安装
用于机器学习的主机配置好啦（具体配置参见机器学习之攒机指南)，接下来就是安装操作系统以及CUDA环境了。机器学习...
深度学习服务器搭建笔记——从硬件选择到环境安装
一、硬件选择快速指南参考文章：RTX 2080时代，如何打造属于自己的深度学习机器 GPU： RTX 2070...
深度学习主机环境搭建cuda9.0+cuDNN7.1.1
1 搭建准备进行深度学习主机的环境搭建，首先当然是要了解自己的机器啦，下面我们来看看本次搭建的软硬件配置。硬件...
04-k8s集群安装-前置环境
一、安装要求一台或多台机器，操作系统 CentOS7.x-86_x64 硬件配置：2GB 或更多 RAM，2 个...
kubeadmin部署k8s集群
1、安装要求一台或多台机器，操作系统 CentOS7.x-86_x64 硬件配置：2GB 或更多 RAM，2 个...
远程安装软件(1)
要求脚本实现在A机器，远程安装软件package到B机器思路将需要安装的软件包package从A拷贝到B，然...
Ai-笔记：24小时的Industry Standards
什么：我把一台旧机器（HP 625）稍微升级了硬件，安装了一些行业标准软件。其中包括，Adobe全家，Autod...
机器学习-python环境配置
折腾了半天，终于配置好了可以运行通过的机器学习运行环境。一、软件安装安装anaconda环境，anaconda...
深度学习硬件指南
GPU 假设你将用GPU来进行深度学习或者你正在构建或者升级用于深度学习的系统，那么抛开GPU是不合理的。GPU是...