第一次尝试:失败
两台图形工作站已经设置好了ssh互通,主机名分别设置为npuheart0和nuheart1。先从mpiuser进入主机,这个主机名字是npuheart0,它的计算卡是k20c,然后后再通过ssh npuheart1进入另外一台主机。这台主机上什么都没有安装。先su npuheart1获取权限,然后再通过sudo apt-get install nvidia-cuda-toolkit直接安装cuda,所有依赖都会自动补充上。显卡驱动也直接用apt安装,结果失败了。
第二次尝试:失败
从官网上看来这样的代码,直接复制运行。
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
驱动安装失败
第三次尝试:半成功
自动安装所有驱动
sudo ubuntu-drivers autoinstall
重启,然后安装的时候不选择驱动,成功安装cuda,但是编译好运行的时候提示驱动版本太低。
第四次尝试:
禁用图形目标
sudo systemctl isolate multi-user.target
卸载Nvidia驱动程序
modprobe -r nvidia-drm
安装驱动完成之后再次启动图形环境,可以使用此命令:
sudo systemctl start graphical.target
原文链接:https://blog.csdn.net/xcls2010/article/details/89641853
手动下载驱动地址:https://us.download.nvidia.cn/XFree86/Linux-x86_64/440.64/NVIDIA-Linux-x86_64-440.64.run,然后运行下面的命令,选择覆盖、替换等。
sudo chmod +x NVIDIA-Linux-x86_64-440.36.run
sudo ./NVIDIA-Linux-x86_64-440.36.run
驱动就能安装上去了。
最后添加路径:
export PATH=$PATH:/usr/local/cuda-10.2/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64
CUDA的安装结果是这样的
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-10.2/
Samples: Installed in /home/npuheart0/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-10.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.2/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.2/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 440.00 is required for CUDA 10.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
cuda程序的编译
假设整个程序包括两个源文件test.c和test_cuda.cu,test.c是MPI程序,test_cuda.cu是CUDA程序,在test.c中调用了test_cuda.cu中的函数。
该程序的Makefile文件内容如下:
NVCC = nvcc
MPICC = mpicc
LIBS = -lcudart -lcurand -L$(CUDA_INSTALL_PATH)/lib64
CFILES = test.c
CUFILES = test_cuda.cu
OBJECTS = test.o test_cuda.o
EXECNAME = test
all:
$(MPICC) -c $(CFILES)
$(NVCC) -c $(CUFILES)
$(MPICC) -o $(EXECNAME) $(LIBS) $(OBJECTS)
clean:
rm -f *.o $(EXECNAME)
原文链接:https://blog.csdn.net/warren912/article/details/19968419
网友评论