一、解压oozie目录下的oozie-examples.tar.gz
Step1
$ cd oozie
$ sudo tar -zxvf oozie-examples.tar.gz
Step2. 将oozie/examples 拷贝到hdfs 主目录上,且需保证hdfs上不存在文件路径/user/hadoop/examples
$ cd /opt/cloudera/hadoop
$ ./bin/hadoop fs -put /opt/cloudera/oozie/examples/ examples
二、运行例子
Oozie运行的是mapreduce程序,因而需要配置相关信息。当前的mapreduce程序是运行在yarn上,即resource manager上,我们需要知道其端口号。
查看ResourceManager链接的端口号
$ cd /opt/cloudera/hadoop
$ ./bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar pi 1 30
运行结果:
18/08/15 23:02:38 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.1.187:8032
18/08/15 23:02:39 INFO input.FileInputFormat: Total input paths to process : 1
18/08/15 23:02:39 INFO mapreduce.JobSubmitter: number of splits:1
18/08/15 23:02:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534386415651_0001
18/08/15 23:02:41 INFO impl.YarnClientImpl: Submitted application application_1534386415651_0001
18/08/15 23:02:41 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1534386415651_0001/
18/08/15 23:02:41 INFO mapreduce.Job: Running job: job_1534386415651_0001
18/08/15 23:02:53 INFO mapreduce.Job: Job job_1534386415651_0001 running in uber mode : false
18/08/15 23:02:53 INFO mapreduce.Job: map 0% reduce 0%
18/08/15 23:02:59 INFO mapreduce.Job: map 100% reduce 0%
18/08/15 23:03:06 INFO mapreduce.Job: map 100% reduce 100%
18/08/15 23:03:06 INFO mapreduce.Job: Job job_1534386415651_0001 completed successfully
18/08/15 23:03:06 INFO mapreduce.Job: Counters: 49
端口号:8032
三、mapreduce例子的目录结构
$ cd oozie/examples/apps/map-reduce
$ ll
-rw-r--r-- 1 1106 4001 1012 Jan 6 2018 job.properties
-rw-r--r-- 1 1106 4001 1028 Jan 6 2018 job-with-config-class.properties
drwxr-xr-x 2 root root 4096 Aug 15 20:32 lib/
-rw-r--r-- 1 1106 4001 2274 Jan 6 2018 workflow-with-config-class.xml
-rw-r--r-- 1 1106 4001 2559 Jan 6 2018 workflow.xml
修改job.properties配置文件信息
nameNode=hdfs://localhost:8020
jobTracker=localhost:8021
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/workflow.xml
outputDir=map-reduce
修改成如下配置:
nameNode=hdfs://Master:9000
jobTracker=Master:8032
queueName=default
examplesRoot=examples
# 定义workflow工作的hdfs目录
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/workflow.xml
# mapreduce输出结果的目录
outputDir=map-reduce
输入文件路径(hdfs):
examples/input-data/
输出文件路径(hdfs):
examples/output-data/${EXAMPLE_NAME}
Note : job.properties file needs to be a local file during submissions, and not a HDFS path.也就是说在本地需要这个配置文件,即使我们更改了本地的 job.properties,没有提交更新到hdfs上,也是没关系的。
四、如何运行一个应用
$ ./bin/oozie job -oozie http://192.168.1.187:11000/oozie -config examples/apps/map-reduce/job.properties -run
结果显示:
job: 0000001-180815233809319-oozie-hado-W
打开Master:8088监控页面:
yarn.png
打开Oozie监控页面 Master:11000
oozie.png
到这里,Oozie运行MapReduce Wordflow案例讲解完毕!
网友评论