美文网首页
Hadoop环境搭建教程

Hadoop环境搭建教程

作者: _酒酿芋圆 | 来源:发表于2018-07-28 03:07 被阅读0次

Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储
Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
Hadoop的框架最核心的设计是:HDFS和MapReduce。HDFS为海量的数据提供存储,则MapReduce为海量的数据提供计算


一、操作环境

1.1 操作系统:Windows 10 64位

1.2 虚拟机:VMware Workstation Pro 14.1.2

二、安装包

2.1 Ubuntu Server 镜像:ubuntu-18.04-live-server-amd64.iso

2.2 Java:jdk-8u181-linux-x64.tar.gz

2.2 Hadoop: hadoop-2.8.4.tar.gz

三、创建虚拟机

3.1 创建master

master

3.2 创建slave1

slave1

3.3 创建slave2

slave2

四、开始配置

4.1 安装openssh-server

sudo apt-get install openssh-server


因为镜像文件Ubuntu 18.04 Server中自带了最新版本的openssh-server,所以不用再安装。如果没有自带openssh-server,则需要安装。

4.2 使用WinSCP或其他SSH远程连接虚拟机

根据虚拟机的IP地址远程连接,默认端口号22。


使用WinSCP远程连接master 远程连接master成功

同理,使用WinSCP远程连接slave1、slave2。

4.3 使用vi修改/etc/sudoers文件

sudo vi /etc/sudoers
通过方向键控制绿色光标找到root ALL=(ALL:ALL) ALL
在下面一行按i进入编辑模式,键入username ALL=(ALL:ALL) ALL


完成后,按下ESC切换到命令模式,按切换到底线命令模式,输入wq!保存并退出。对其余机器进行同样操作。

4.4 使用vi修改/etc/hosts文件

sudo vi /etc/hosts
将三台机器的IP地址和主机名添加到hosts文件中,并对其余机器进行同样操作。

sudo reboot
操作完成后重启。
cat /etc/hosts
重启完成后,查看hosts文件是否修改成功。

4.5 免密钥登录配置

ssh-keygen -t rsa -P ""
在master、slave1和slave2中键入以上代码,enter选择默认。


master:

cd .ssh/
scp slave1:~/.ssh/id_rsa.pub id_rsa.pub1
scp slave2:~/.ssh/id_rsa.pub id_rsa.pub2


使用ll命令查看当前目录下的文件
cat id_rsa.pub >> authorized_keys
cat id_rsa.pub1 >> authorized_keys
cat id_rsa.pub2 >> authorized_keys
使用ll命令查看当前目录下的文件
使用cat authorized_key查看authorized_keys
scp authorized_keys slave1:~/.ssh/
scp authorized_keys slave2:~/.ssh/
将authorized_keys传递给slave1和slave2

slave1 && slave2:
cd .ssh/
ll
可看到slave1和slave2中出现了authorized_keys

slave1 slave2

4.6 免密钥登录验证

master:
ssh slave1


ssh slave2

slave1:
ssh master


ssh slave2
slave2:
ssh master
ssh slave1

4.7 安装Java和hadoop包

4.7.1 将文件拷贝进虚拟机

在/home/hadoop文件夹下新建文件夹softs


将jdk-8u181-linux-x64.tar.gz和hadoop-2.8.4.tar.gz上传到softs文件夹中


cd softs
ll

4.7.2 解压文件

tar -zxvf jdk-8u181-linux-x64.tar.gz
tar -zxvf hadoop-3.0.3.tar.gz
ll

sudo mv jdk1.8.0_181/ /usr/local/jdk1.8
sudo mv hadoop-3.0.3/ /usr/local/hadoop2

输入cd /usr/local && ll,查看是否移动成功

输入sudo chmod -R 777 hadoop2/ jdk1.8/,更改权限

权限对比

输入sudo chown -R hadoop:hadoop hadoop2/ jdk1.8/,更改拥有者

拥有者对比

4.7.3 添加环境变量

4.7.3.1 添加Java环境变量

cd ~
ls -all


输入cat .profile查看环境变量信息

vi .bashrc
在最下方添加如下环境变量:

#JAVA VARIABLES
export JAVA_HOME=/usr/local/jdk1.8/
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$PATH

输入wq保存 并退出


source .bashrc
立即执行.bashrc使环境变量生效
$PATH
java -version
javac -version

可看到Java环境变量已生效

4.7.3.2 添加Hadoop环境变量

vi .bashrc
在最下方添加如下环境变量:

#HADOOP VARIABLES
export HADOOP_HOME=/usr/local/hadoop2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

输入wq保存并退出

source .bashrc
立即执行.bashrc使环境变量生效
$PATH

对slave1和slave2进行相同配置

4.7.4 Hadoop配置

vi /usr/local/hadoop2/etc/hadoop/hadoop-env.sh
将配置${JAVA_HOME}为jdk绝对路径,比如 /usr/local/jdk1.8

4.7.5 Hadoop验证(master)

启动单机WordCount测试
cd /usr/local/hadoop2
cp README.txt input
./bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.8.4-sources.jar org.apache.hadoop.examples.WordCount input output


输入cat output/*查看输出结果
(BIS),  1
(ECCN)  1
(TSU)   1
(see    1
5D002.C.1,      1
740.13) 1
<http://www.wassenaar.org/>     1
Administration  1
Apache  1
BEFORE  1
BIS     1
Bureau  1
Commerce,       1
Commodity       1
Control 1
Core    1
Department      1
ENC     1
Exception       1
Export  2
For     1
Foundation      1
Government      1
Hadoop  1
Hadoop, 1
Industry        1
Jetty   1
License 1
Number  1
Regulations,    1
SSL     1
Section 1
Security        1
See     1
Software        2
Technology      1
The     4
This    1
U.S.    1
Unrestricted    1
about   1
algorithms.     1
and     6
and/or  1
another 1
any     1
as      1
asymmetric      1
at:     2
both    1
by      1
check   1
classified      1
code    1
code.   1
concerning      1
country 1
country's       1
country,        1
cryptographic   3
currently       1
details 1
distribution    2
eligible        1
encryption      3
exception       1
export  1
following       1
for     3
form    1
from    1
functions       1
has     1
have    1
http://hadoop.apache.org/core/  1
http://wiki.apache.org/hadoop/  1
if      1
import, 2
in      1
included        1
includes        2
information     2
information.    1
is      1
it      1
latest  1
laws,   1
libraries       1
makes   1
manner  1
may     1
more    2
mortbay.org.    1
object  1
of      5
on      2
or      2
our     2
performing      1
permitted.      1
please  2
policies        1
possession,     2
project 1
provides        1
re-export       2
regulations     1
reside  1
restrictions    1
security        1
see     1
software        2
software,       2
software.       2
software:       1
source  1
the     8
this    3
to      2
under   1
use,    2
uses    1
using   2
visit   1
website 1
which   2
wiki,   1
with    1
written 1
you     1
your    1

4.8 Hadoop伪分布式安装部署

4.8.1修改Hadoop集群的配置文件

4.8.1.1 修改hadoop-env.sh文件

cd /usr/local/hadoop2/etc/hadoop
hadoop-env.shjdk8

# The java implementation to use.
export JAVA_HOME=/usr/local/jdk8

改为jdk1.8

# The java implementation to use.
export JAVA_HOME=/usr/local/jdk1.8

4.8.1.2 修改mapred-env.sh文件

mapred-env.shjdk8

export JAVA_HOME=/usr/local/jdk8

改为jdk1.8

export JAVA_HOME=/usr/local/jdk1.8

4.8.1.3 修改yarn-env.sh文件

yarn-env.sh文件中jdk8

# some Java parameters
export JAVA_HOME=/usr/local/jdk8

改为jdk1.8

# some Java parameters
export JAVA_HOME=/usr/local/jdk1.8

4.8.1.4 修改core-site.xml文件

将以下代码拷入core-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop2/tmp</value>
        <description>文件临时存储目录</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <!-- 1.x name>fs.default.name</name -->
        <value>hdfs://master:9000</value>
        <description>hdfs namenode访问地址</description>
    </property>
    <property>
         <name>io.file.buffer.size</name>
         <value>102400</value>
         <description>文件块大小</description>
     </property>

</configuration>

4.8.1.5 修改hdfs-site.xml

将以下代码拷入hdfs-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

    <!-- property>
        <name>dfs.http.address</name>
        <value>master:50070</value>
    </property -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave1:50080</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>文件块的副本数</description>
    </property> </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop2/hdfs/name</value>
        <description>namenode目录</description>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop2/hdfs/data</value>
        <description>datanode目录</description>
    </property>
    <!-- property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property -->

</configuration>

4.8.1.6 修改mapred-site.xml

将以下代码拷入mapred-site.xml文件

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!-- property>
        <name>mapred.map.tasks</name>
        <value>20</value>
    </property>
    <property>
        <name>mapred.reduce.tasks</name>
        <value>4</value>
    </property>
    <property>
        <name>mapred.job.tracker</name>
        <value>master:9000</value>
        <description>job tracker地址</description>
    </property -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>

</configuration>

4.8.1.7 修改yarn-site.xml

yarn-site.xml文件中所有bdm1替换为master

<?xml version="1.0"?>

<configuration>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
    </property>

</configuration>

4.8.1.8 修改slave

slave文件中

bds1
bds2
bds3
bds4
bds5

改为

slave1
slave2

4.8.2 分发配置文件到slave1和slave2

cd /usr/local/hadoop2
scp etc/hadoop/* slave1:/usr/local/hadoop2/etc/hadoop


scp etc/hadoop/* slave2:/usr/local/hadoop2/etc/hadoop

4.8.3 格式化HDFS

使用ll命令查看当前目录/usr/local/hadoop2下的文件列表

./bin/hdfs namenode -format

使用ll命令查看当前目录/usr/local/hadoop2下的文件列表


可看到多出了hdfs文件夹

4.8.4 启动Hadoop集群和资源管理平台

./sbin/start-dfs.sh


使用ll命令查看可发现多了log日志文件夹

./sbin/start-yarn.sh


在浏览器地址栏输入192.168.125.129:50070可进入集群监控界面

可查看每个结点的情况

./sbin/stop-yarn.sh
./sbin/stop-dfs.sh

停止资源管理平台和Hadoop集群
可使用./sbin/start-all.sh启动Hadoop集群和相关服务

启动后输入jps查看

master slave1 slave2

在浏览器地址栏输入192.168.125.129:8088可进入资源管理界面

4.8.5 伪分布式部署测试:运行WordCount示例

cat README.txt >> a.txt
cat a.txt >> b.txt
cat b.txt >> word.txt
使用ll命令查看当前目录


rm -rf a.txt b.txt
使用ll命令查看当前目录
已经成功删除a.txt和b.txt
./bin/hadoop fs -mkdir -p input
./bin/hadoop fs -copyFromLocal word.txt input
可在文件目录/user/hadoop/input中搜索到word.txt
可查看到有关word.txt的信息
./bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.8.4-sources.jar org.apache.hadoop.examples.WordCount input output
此时的资源管理界面为 submitted applications running applications

该任务详情


可在/user/hadoop/output文件夹中查看到输出结果

在本机C:/windows/system32/drivers/etc/hosts的最下方添加:

192.168.125.129     master
192.168.125.136     slave1
192.168.125.133     slave2

点击Download即可启动下载输出结果

打开就能看到数据

(BIS),  27
(ECCN)  27
(TSU)   27
(see    27
5D002.C.1,  27
740.13) 27
<http://www.wassenaar.org/> 27
Administration  27
Apache  27
BEFORE  27
BIS 27
Bureau  27
Commerce,   27
Commodity   27
Control 27
Core    27
Department  27
ENC 27
Exception   27
Export  54
For 27
Foundation  27
Government  27
Hadoop  27
Hadoop, 27
Industry    27
Jetty   27
License 27
Number  27
Regulations,    27
SSL 27
Section 27
Security    27
See 27
Software    54
Technology  27
The 108
This    27
U.S.    27
Unrestricted    27
about   27
algorithms. 27
and 162
and/or  27
another 27
any 27
as  27
asymmetric  27
at: 54
both    27
by  27
check   27
classified  27
code    27
code.   27
concerning  27
country 27
country's   27
country,    27
cryptographic   81
currently   27
details 27
distribution    54
eligible    27
encryption  81
exception   27
export  27
following   27
for 81
form    27
from    27
functions   27
has 27
have    27
http://hadoop.apache.org/core/  27
http://wiki.apache.org/hadoop/  27
if  27
import, 54
in  27
included    27
includes    54
information 54
information.    27
is  27
it  27
latest  27
laws,   27
libraries   27
makes   27
manner  27
may 27
more    54
mortbay.org.    27
object  27
of  135
on  54
or  54
our 54
performing  27
permitted.  27
please  54
policies    27
possession, 54
project 27
provides    27
re-export   54
regulations 27
reside  27
restrictions    27
security    27
see 27
software    54
software,   54
software.   54
software:   27
source  27
the 216
this    81
to  54
under   27
use,    54
uses    27
using   54
visit   27
website 27
which   54
wiki,   27
with    27
written 27
you 27
your    27

hadoop fs -rm -r -f output

可看到在文件系统中删除了output文件夹


./sbin/stop-all.sh
停止Hadoop集群所有服务

4.9 补充

发现在HDFS文件管理系统中无法删除/user/hadoop/input文件夹中的文件,提示信息为Permission denied: user=dr.who, access=WRITE, inode="/user/hadoop/input":hadoop:supergroup:drwxr-xr-x


输入hadoop fs -ls发现当前文件夹的所有者hadoop的权限为rwx,拥有者所在群组为supergroup的权限为rx,其他用户的权限为rx

./bin/hadoop dfs -chmod -R 777 /user/hadoop/input
修改权限


此时,可从HDFS文件系统中删除/user/hadoop/input文件夹中的文件了

相关文章

网友评论

      本文标题:Hadoop环境搭建教程

      本文链接:https://www.haomeiwen.com/subject/movymftx.html