本节主要内容:
Pig安装及使用
Pig是一个基于Hadoop的大规模数据分析工具,它提供的SQL-LIKE语言叫Pig Latin,该语言的编译器会把类SQL的数据分析请求转换为一系列经过优化处理的MapReduce运算。
1.系统环境:
OS:CentOS Linux release 7.5.1804 (Core)
CPU:2核心
Memory:1GB
运行用户:root
JDK版本:1.8.0_252
Hadoop版本:cdh5.16.2
2.集群各节点角色规划为:
172.26.37.245 node1.hadoop.com---->namenode,zookeeper,journalnode,hadoop-hdfs-zkfc,resourcenode,historyserver,hbase,hbase-master,hive,hive-metastore,hive-server2,hive-hbase,sqoop,impala,impala-server,impala-state-store,impala-catalog,pig
172.26.37.246 node2.hadoop.com---->datanode,zookeeper,journalnode,nodemanager,hadoop-client,mapreduce,hbase-regionserver,impala,impala-server,hive
172.26.37.247 node3.hadoop.com---->datanode,nodemanager,hadoop-client,mapreduce,hive,mysql-server,impala,impala-server,
172.26.37.248 node4.hadoop.com---->namenode,zookeeper,journalnode,hadoop-hdfs-zkfc,hive,hive-server2,impala-shell
3.环境说明:
本次追加部署
172.26.37.245 node1.hadoop.com---->pig
一.安装
Node1节点
# yum install pig
二.设置环境变量
# cp -p /etc/profile /etc/profile.20200705
# vi /etc/profile
增加以下内容
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
如果用的是YARN,那么设置HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce 如果用的是是MRv1,那么就是 /usr/lib/hadoop-0.20-mapreduce
# source /etc/profile
# echo $HADOOP_MAPRED_HOME
/usr/lib/hadoop-mapreduce
三.进入交互模式
# sudo -u hdfs pig
支持cd,ls,pwd等常用shell命令
grunt> ls
hdfs://cluster1/user/.staging <dir>
grunt> pwd
hdfs://cluster1/user/hdfs
grunt> quit
四.与hbase交互
# hbase shell
建立要导入的Hbase表
hbase(main):001:0> create 'customers', 'customers_data'
hbase(main):002:0> scan 'customers'
ROW COLUMN+CELL
0 row(s) in 0.16.20 seconds
hbase(main):002:0> quit
本地创建一个数据文件
# vi /customers
插入以下数据
01,zhang,san,11,teacher
02,li,si,12,farmer
03,wang,wu,13,doctor
04,zhao,liu,14,driver
05,tian,qi,15,police
06,wang,ba,16,cleaner
07,mi,jiu,17,student
上传到hdfs的 /user/pig 目录下
# sudo -u hdfs hdfs dfs -mkdir /user/pig
# sudo -u hdfs pig
grunt > cd /user/pig
grunt > copyFromLocal /customers ./customers
grunt > ls
# sudo -u hdfs hdfs dfs -ls /user/pig
-rw-r--r-- - mapred hadoop 0 2020-07-06 11:56 /user/pig/customers
创建一个pig脚本(Node1节点)
# vi /Load_HBase_Customers.pig
插入以下内容:
raw_data = LOAD 'hdfs:/user/pig/customers' USING PigStorage(',') AS ( #声明源数据的位置,数据如何分割,声明分割后的key
id:chararray,
firstname:chararray,
lastname:chararray,
age:int,
job:chararray
);
STORE raw_data INTO 'hbase://customers' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( #存储到位置,使用哪个类,并将value对应进去。
'customers_data:firstname
customers_data:lastname
customers_data:age
customers_data:job'
);
执行脚本
# sudo -u hdfs PIG_CLASSPATH=/usr/lib/hbase/hbase-client-1.2.0-cdh5.16.2.jar:/usr/lib/zookeeper/zookeeper-3.4.5-cdh5.16.2.jar /usr/bin/pig /Load_HBase_Customers.pig
声明环境变量,要调用hbase和zookeeper,及执行命令和调用脚本
检验
# hbase shell
hbase(main):001:0> scan 'customers'
ROW COLUMN+CELL
01 column=customers_data:age, timestamp=1556119952730, value=11
01 column=customers_data:firstname, timestamp=1556119952730, value=zhang
01 column=customers_data:job, timestamp=1556119952730, value=teacher
01 column=customers_data:lastname, timestamp=1556119952730, value=san
02 column=customers_data:age, timestamp=1556119952741, value=12
02 column=customers_data:firstname, timestamp=1556119952741, value=li
02 column=customers_data:job, timestamp=1556119952741, value=farmer
02 column=customers_data:lastname, timestamp=1556119952741, value=si
03 column=customers_data:age, timestamp=1556119952741, value=13
03 column=customers_data:firstname, timestamp=1556119952741, value=wang
03 column=customers_data:job, timestamp=1556119952741, value=doctor
03 column=customers_data:lastname, timestamp=1556119952741, value=wu
04 column=customers_data:age, timestamp=1556119952742, value=14
04 column=customers_data:firstname, timestamp=1556119952742, value=zhao
04 column=customers_data:job, timestamp=1556119952742, value=driver
04 column=customers_data:lastname, timestamp=1556119952742, value=liu
05 column=customers_data:age, timestamp=1556119952742, value=15
05 column=customers_data:firstname, timestamp=1556119952742, value=tian
05 column=customers_data:job, timestamp=1556119952742, value=police
05 column=customers_data:lastname, timestamp=1556119952742, value=qi
06 column=customers_data:age, timestamp=1556119952743, value=16
06 column=customers_data:firstname, timestamp=1556119952743, value=wang
06 column=customers_data:job, timestamp=1556119952743, value=cleaner
06 column=customers_data:lastname, timestamp=1556119952743, value=ba
07 column=customers_data:age, timestamp=1556119952744, value=17
07 column=customers_data:firstname, timestamp=1556119952744, value=mi
07 column=customers_data:job, timestamp=1556119952744, value=student
07 column=customers_data:lastname, timestamp=1556119952744, value=jiu
7 row(s) in 0.9590 seconds
网友评论