Note22:Hadoop2.7.2 支持LZO压缩
编译准备
hadoop 本身并不支持 lzo 压缩,故需要使用 twitter 提供的 hadoop-lzo 开源组件。
hadoop-lzo 需依赖 hadoop 和 lzo 进行编译,编译步骤如下:
- 先下载相关文件
-
LZO:http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
-
hadoop-lzo源码包:https://github.com/twitter/hadoop-lzo/archive/master.zip
-
JDK
-
Maven
- 环境准备
- 安装JDK
[root@hadoop115 software] # tar -zxvf jdk-8u241-linux-x64.tar.gz -C /opt/module/
[root@hadoop115 software]# vim /etc/profile
#JAVA_HOME:
export JAVA_HOME=/opt/module/jdk1.8.0_241
export PATH=$PATH:$JAVA_HOME/bin
[root@hadoop115 software]# source /etc/profile
验证命令:java -version
- 安装Maven
[root@hadoop115 software]# tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /opt/module/
[root@hadoop115 apache-maven-3.6.3]# vim /etc/profile
#MAVEN_HOME
export MAVEN_HOME=/opt/module/apache-maven-3.6.3
export PATH=$PATH:$MAVEN_HOME/bin
[root@hadoop115 software]# source /etc/profile
验证命令:mvn -version
修改 settings.xml 配置国内阿里云镜像
[root@hadoop115 apache-maven-3.6.3]# vi conf/settings.xml
# 需要找对相应位置添加下面内容:
<localRepository>/opt/module/apache-maven-3.6.3/Local_Repository</localRepository>
<mirrors>
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
</mirrors>
- 安装其他
[root@hadoop115 apache-maven-3.6.3]# yum -y install lzo-devel zlib-devel gcc autoconf automake libtool
编译LZO
- 解压
[root@hadoop115 software]# tar -zxvf lzo-2.10.tar.gz
- 进入刚解压的文件目录里
[root@hadoop115 software]# cd lzo-2.10/
- 执行
[root@hadoop115 lzo-2.10]# ./configure -prefix=/usr/local/hadoop/lzo/
[root@hadoop115 lzo-2.10]# make
[root@hadoop115 lzo-2.10]# make install
编译 hadoop-lzo 源码
- 解压
[root@hadoop115 software]# unzip hadoop-lzo-master.zip
- 解压后进入,修改pom.xml文件
[root@hadoop115 software]# cd hadoop-lzo-master
[root@hadoop115 hadoop-lzo-master]# vim pom.xml
<hadoop.current.version>2.7.2</hadoop.current.version>
- 声明临时变量
[root@hadoop115 hadoop-lzo-master]# export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include
[root@hadoop115 hadoop-lzo-master]# export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
- 编译
[root@hadoop115 hadoop-lzo-master]# cd hadoop-lzo-master
[root@hadoop115 hadoop-lzo-master]# mvn package -Dmaven.test.skip=true
- 进入target下,hadoop-lzo-0.4.21-SNAPSHOT.jar 就是编译成功的 hadoop-lzo 组件
使用LZO压缩
- 上传分发
将编译好后的 hadoop-lzo-0.4.21-SNAPSHOT.jar 放入 /opt/module/hadoop-2.7.2/share/hadoop/common/
分发
[kevin@hadoop112 module]$ cd /opt/module/hadoop-2.7.2/share/hadoop/common/
[kevin@hadoop112 common]$ xsync.sh hadoop-lzo-0.4.21-SNAPSHOT.jar
- 添加配置 支持 LZO 压缩
[kevin@hadoop112 module]$ cd /opt/module/hadoop-2.7.2/etc/hadoop/
[kevin@hadoop112 hadoop]$ vim core-site.xml
内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop101:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
<!-- 添加压缩方式 -->
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
- 分发
[kevin@hadoop112 hadoop]$ xsync.sh core-site.xml
创建索引
- 使用之前,每台机器都先执行:
[kevin@hadoop112 module]$ yum -y install lzo-devel zlib-devel gcc autoconf automake libtool
否则会报错误:ERROR lzo.LzoCodec: Failed to load/initialize native-lzo library
- 创建 LZO 文件的索引,LZO 压缩文件的可切片特性依赖于其索引,故我们需要手动为LZO 压缩文件创建索引。若无索引,则 LZO 文件的切片只有一个
[kevin@hadoop112 module]$ hadoop jar /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
- 一个文件就要建一次索引,是有点麻烦
网友评论