高效率代码课程目录

Chapter1. Benchmarking
Chapter2. R语言高效化基础
Chapter3. 查看Code内部
Chapter4. 多线程计算

查看CPU线程数

会用到多线程包parallel。镜像服务器是8线程。

# Load the parallel package
library(parallel)

# Store the number of cores in the object no_of_cores
no_of_cores <- detectCores()

# Print no_of_cores
no_of_cores

[1] 8

常规parallel操作流程

detectCores()
[1] 8
# Create a cluster via makeCluster
cl <- makeCluster(2)
# Parallelize this code
parApply(cl,dd, 2, median)
 [1] -0.053946179 -0.168234607 -0.056308656 -0.103888726  0.202869314
 [6]  0.019541928 -0.258089759 -0.006198904 -0.054646615  0.094430957
# Stop the cluster
stopCluster(cl)

指定线程数
根据线程数或者实际需要创建cluster
用parApply等多线程专用指令
结束多线程

如果是自己编写的函数的话，还需要多一步传递函数到cluster的步骤

library("parallel")
# Create a cluster via makeCluster (2 cores)
cl <- makeCluster(2)
# Export the play() function to the cluster
clusterExport(cl,"play")
# Re-write sapply as parSapply
res <- parSapply(cl, 1:100, function(i) play())
# Stop the cluster
stopCluster(cl)

最后再举个例子来比较一下多线程和单线程到底差多少。

# Set the number of games to play
no_of_games <- 1e5
## Time serial version
system.time(serial <- sapply(1:no_of_games, function(i) play()))
   user  system elapsed 
  9.370   0.016   9.512 
## Set up cluster
cl <- makeCluster(4)
clusterExport(cl, "play")
## Time parallel version
system.time(par <- parSapply(cl,1:no_of_games, function(i) play()))
   user  system elapsed 
  0.064   0.008   3.216 
## Stop cluster
stopCluster(cl)

4线程比单线程快了3倍。
当然并不是什么情况下都是多线程快，大多数需要用到for循环的情况下，多线程的优势会比较明显。