准备工作
安装JDK,自行百度一下
配置免秘登录,查看另外一篇文章https://www.jianshu.com/p/fa06f3d77094
安装Zookeeper,查看另外一篇文章https://www.jianshu.com/p/d6967310777c
1、下载Hadoop安装包
apache版本
https://hadoop.apache.org/releases.html
cdh版本
http://archive.cloudera.com/cdh5/cdh/5/
2、配置文件说明
文件名 | 格式 | 功能描述 |
---|---|---|
hadoop-env.sh | Bash脚本 | Hadoop运行环境变量设置 |
core-site.xml | xml | 配置Hadoop core,如IO |
hdfs-site.xml | xml | 配置HDFS守护进程:NN、JN、DN |
yarn-env.sh | Bash脚本 | Yarn运行环境变量设置 |
yarn-site.xml | xml | Yarn框架配置环境 |
mapred-site.xml | xml | MR属性设置 |
capacity-scheduler.xml | xml | Yarn调度属性设置 |
container-executor.cfg | Cfg | Yarn Container配置 |
mapred-queues.xml | xml | MR队列设置 |
hadoop-metrics.properties | Java属性 | Hadoop Metrics配置 |
hadoop-metrics2.properties | Java属性 | Hadoop Metrics配置 |
slaves | PlainText | DN节点配置 |
exclude | PlainText | 移除DN节点配置文件 |
log4j.properties | 系统日志设置 |
3、hadoop-env.sh配置
#Java环境变量
export JAVA_HOME=~/jdk1.8.0_101
#Hadoop配置文件路径
export HADOOP_CONF_DIR=~/hadoop-2.5.0-cdh5.2.1-och4.0.1/etc/hadoop
#Hadoop环境变量
export HADOOP_HOME=~/hadoop-2.5.0-cdh5.2.1-och4.0.1
#进程id路径
export HADOOP_PID_DIR=~/data/hadoop/pids
# hadoop为各个守护进程
#【namenode,secondarynamenode,jobtracker,datanode,tasktracker】
# 统一分配的内存在hadoop-env.sh中设置
export HADOOP_HEAPSIZE=8192
# NameNode内存(系统内存足够时,设置成16384M)
export HADOOP_NAMENODE_OPTS="-Xmx4096m –Xms4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
# DataNode内存(系统内存足够时,设置成2-4G)
export HADOOP_DATANODE_OPTS="-Xmx3072m –Xms3072m -Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
# secondrynamenode的内存,与NameNode保持一致
export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx4096m –Xms4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
# 调整客户端操作时的内存
export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"
# 配置hadoop相关日志
export HADOOP_LOGFILE=hadoop-${HADOOP_IDENT_STRING}-${command}-${HOSTNAME}.log
export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,console"}
export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"WARN,RFAS"}
export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"WARN,NullAppender"}
4、core-site.xml配置
<!--默认端口是8020,但是由于其接收Client连接的RPC端口,所以如果在hdfs-site.xml中配置了RPC端口9000,所以fs.defaultFS端口变为9000-->
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<!--注意修改此路径-->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ocetl/data/hadoop/hadoop-${user.name}</value>
</property>
5、hdfs-site.xml配置
<!--注意修改此路径-->
<!--data存放路径-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/data1,/data2,/data3,/data4,/data5,/data6</value>
<final>true</final>
</property>
<!--NameNode持久存储命名空间和事务日志的本地文件系统上的路径-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/ocetl/data/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/ocetl/data/hadoop/journal</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/ocetl/app/hadoop-2.5.0-cdh5.2.1-och4.0.1/etc/hadoop/excludes</value>
</property>
<!--注意修改主机名,einvoice243为NameNode主,einvoice244为NameNode备-->
<property>
<name>dfs.namenode.rpc-address.ocetl.nn1</name>
<value>einvoice243:8030</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ocetl.nn2</name>
<value>einvoice244:8030</value>
</property>
<property>
<name>dfs.namenode.http-address.ocetl.nn1</name>
<value>einvoice243:50082</value>
</property>
<property>
<name>dfs.namenode.http-address.ocetl.nn2</name>
<value>einvoice244:50082</value>
</property>
<!--einvoice243/einvoice244/einvoice247均为ZooKeeper节点-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://einvoice243:8488;einvoice244:8488;einvoice247:8488/ocetl</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>einvoice243:21810,einvoice244:21810,einvoice247:21810</value>
</property>
6、yarn-site.xml配置
<property>
<name>yarn.resourcemanager.zk.state-store.address</name>
<value>einvoice243:21810,einvoice244:21810,einvoice247:21810</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>einvoice243:21810,einvoice244:21810,einvoice247:21810</value>
</property>
<!--配置文件中einvoice243为resourcemanager主,einvoice244为resourcemanager备-->
<!-- RM1 configs -->改为resourcemanager主机节点主机名
<!-- RM2 configs -->改为resourcemanager备机节点主机名
<!--检查其他配置项的主机名,注意修改-->
<!-- RM1 configs -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>einvoice243:23140</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>einvoice243:23130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>einvoice243:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>einvoice243:23125</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm1</name>
<value>einvoice243:23141</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm1</name>
<value>einvoice243:23142</value>
</property>
<!-- RM2 configs -->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>einvoice244:23140</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>einvoice244:23130</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>einvoice244:23188</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>einvoice244:23125</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address.rm2</name>
<value>einvoice244:23141</value>
</property>
<property>
<name>yarn.resourcemanager.ha.admin.address.rm2</name>
<value>einvoice244:23142</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.ha.id</name>
<!-- on rm1 set to rm1, on rm2 set to rm2 -->
<value>rm1</value>
</property>
<!-- 路径修改 -->
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data1/yarn/local,/data2/yarn/local,/data3/yarn/local,/data4/yarn/local,/data5/yarn/local,/data6/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data1/yarn/log,/data2/yarn/log,/data3/yarn/log,/data4/yarn/log,/data5/yarn/log,/data6/yarn/log</value>
</property>
7、mapred-env.sh配置
export HADOOP_MAPRED_PID_DIR=~/data/hadoop/pids
8、mapred-site.xml配置
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10120</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19988</value>
</property>
9、slaves配置
将Datanode的IP或hostname写入slaves文件。
einvoice247
einvoice248
einvoice249
einvoice250
10、设置Hadoop环境变量
vi ~/.bash_profile
export HADOOP_HOME=~/hadoop-2.5.0-cdh5.2.1-och4.0.1
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_PREFIX=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_LOG_DIR=${HADOOP_HOME}/logs
export YARN_PREFIX=${HADOOP_HOME}
使环境变量生效
source ~/.bash_profile
11、分发到对应主机
12、Hadoop启动前准备
注意:此小节所有操作只有第一次安装才执行,后面操作禁用/慎用此处命令。
格式化data目录
启动journalnod前,格式化前先删除data目录除zookeeper外都可以删。
rm -r ~/data/hadoop/hdfs/name/*
rm -r ~/data/hadoop/journal/*
rm -r ~/data/hadoop/pids/*
启动ZooKeeper
zkServer.sh start
# 格式化ZK,创建命名空间,在一台namenode上执行【einvoice243】
hdfs zkfc -formatZK
启动JournalNode
# 安装奇数个,与zookeeper相同主机
hadoop-daemon.sh start journalnode
注:einvoice243,einvoice244,einvoice247三台主机
格式化NameNode
# NameNode主节点上执行【einvoice243】
hdfs namenode -format
格式化DataNode
# slaves文件中配置的所有主机上都有执行【einvoice247-250】
hdfs datanode -format
启动Hadoop进程
# 主namenode节点执行【einvoice243】
hadoop-daemon.sh start namenode
# 备namenode节点执行【einvoice244】
hdfs namenode -bootstrapStandby # 只是同步数据,不是启动
# 主节点和备节点,启动所有进程【einvoice243、einvoice244】
start-all.sh
# 验证压缩方式,查看基本的压缩库是否正常
hadoop org.apache.hadoop.util.NativeLibraryChecker
13、Hadoop启动
注:第一次启动hadoop时按照上述步骤启动,以后启停hadoop时只需直接在主节点或者备节点执行
start-all.sh
stop-all.sh
14、验证Hadoop是否启动成功
# 测试Map
hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.1.jar randomwriter rand
# 测试Reduce
hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.1.jar sort rand sort-rand
# 测试yarn
hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.2.1.jar pi 10 100
# 验证结束后部分临时文件,请删除
hadoop fs –rm -r /user/ocetl/*
网友评论