安装parallel (Ubuntu):
sudo apt-get install parallel
例如,使用parallel并行把本地数据put到HDFS上:
parallel -j 20 hadoop fs -put {} $DATA_DIR_HDFS ::: $DATA_DIR_LOCAL/*
其中j
表示并行job数:
--jobs N
-j N
--max-procs N
-P N Number of jobslots. Run up to N jobs in parallel. 0 means as many as possible.
Default is 100% which will run one job per CPU core.
If --semaphore is set default is 1 thus making a mutex.
其中:::
表示多源输入
parallel cat {} ::: abc.txt def.txt
结果为
a
b
c
d
e
f
网友评论