Hadoop 3.3伪分布式环境搭建
环境
一台Linux ubuntu 20.04 系统的阿里云服务器。
JAVA环境安装
安装jdk
root@ALBB:~# apt-get install openjdk-11-jdk
查看是否java命令是否能使用,如下表示安装java成功
root@ALBB:~# java
Usage: java [options] <mainclass> [args...]
(to execute a class)
or java [options] -jar <jarfile> [args...]
(to execute a jar file)
or java [options] -m <module>[/<mainclass>] [args...]
java [options] --module <module>[/<mainclass>] [args...]
(to execute the main class in a module)
or java [options] <sourcefile> [args]
(to execute a single source-file program)
Arguments following the main class, source file, -jar <jarfile>,
-m or --module <module>/<mainclass> are passed as the arguments to
main class.
查看jdk路径,发现安装路径实际上在/usr/lib/jvm/java-11-openjdk-amd64这里,/usr/bin/下其实是软链接
root@ALBB:~# which java
/usr/bin/java
root@ALBB:~# ls -lrt /usr/bin/java
lrwxrwxrwx 1 root root 22 May 9 22:42 /usr/bin/java -> /etc/alternatives/java
root@ALBB:~# ls -lrt /etc/alternatives/java
lrwxrwxrwx 1 root root 43 May 9 22:42 /etc/alternatives/java -> /usr/lib/jvm/java-11-openjdk-amd64/bin/java
使用vim打开/etc/profile文件,配置java环境变量(记得换成自己的java安装路径)
#set java environment
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
export CLASSPATH=$CLASSPATH.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
建立信任关系
ssh-keygen -t rsa #一路回车
这个命令的作用是生成公钥和私钥,并且在该用户的根目录下生成.ssh目录。-t type rsa是一种加密算法。
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
将生成的公钥添加到当前用户的认证文件中,ssh localhost 不需要输入密码则为成功。
hadoop环境安装
下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
tar -zxvf hadoop-3.3.0.tar.gz
mv hadoop-3.3.0.tar.gz /usr/local/
cd /usr/local/
mv hadoop-3.3.0 hadoop
cd hadoop
打开/etc/profile/文件,配置hadoop环境变量
#set hadoop environment
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
在hadoop中配置Java环境变量,这里不配置会报错。把里面的JAVA_HOME替换成本机真实的jdk路径
vim ./etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
然后是配置hdfs和yarn。配置前先进到hadoop的根目录
cd /usr/local/hadoop/
vim ./etc/hadoop/core-site.xml
vim ./etc/hadoop/hdfs-site.xml
vim ./etc/hadoop/yarn-site.xml
vim ./etc/hadoop/mapred-site.xml
core-site.xml 配置文件内容如下:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>NameNode URI</description>
</property>
</configuration>
hdfs-site.xml 配置文件如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
yarn-site.xml 配置文件如下:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
mapred-site.xml配置文件如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置完成之后输入jps,应该有6个Java进程,缺一不可。
8002 NameNode
15624 Jps
7449 ResourceManager
7546 NodeManager
7306 SecondaryNameNode
8415 DataNode
hadoop启动与关闭
start-dfs.sh #开启hdfs
start-yarn.sh #开启YARN
start-all.sh #开启所有服务
stop-all.sh #关闭所有服务
遇到的问题
Starting namenodes on [localhost] localhost: Error: JAVA_HOME is not set and could not be found. loc
在hadoop-config.sh里添加
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
JAVA_HOME要添加自己的,注意要添加到第一行,最后一行可不行
hadoop ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_US
在Hadoop安装目录下找到sbin文件夹
在里面修改四个文件
对于start-dfs.sh和stop-dfs.sh文件,添加下列参数:
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
对于start-yarn.sh和stop-yarn.sh文件,添加下列参数:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
重新开始start...就可以了。
网友评论