Ubuntu搭建Hadoop

作者: Charles__Jiang | 来源:发表于2017-03-29 16:59 被阅读180次

    环境

    服务器(虚拟机):

    • vm-master 10.211.55.23
    • vm-slave1 10.211.55.25
    • vm-slave2 10.211.55.24

    软件环境:

    • Hadoop 2.7
    • JDK 1.8
    • Ubuntu 14.04

    Step1:创建账号并授权

    使用root账户创建 hadoop用户,并设置密码为 111111

    adduser hadoop
    输入密码:111111
    确认密码:111111
    以下步骤回车即可...
    

    使用root账户给Hadoop用户授予root权限

    vim /etc/sudoers
    在 "root ALL=(ALL:ALL) ALL" 下添加
    hadoop  ALL=(ALL:ALL) ALL
    
    如下:
    # User privilege specification
    root    ALL=(ALL:ALL) ALL
    hadoop  ALL=(ALL:ALL) ALL
    
    注:保存退出使用 wq!
    

    Step2:修改hosts地址

    vim /etc/hosts
    10.211.55.23  vm-master
    10.211.55.25  vm-slave1
    10.211.55.24  vm-slave2
    
    注:使用ip地址使用ifconfig查看
    

    Step3:安装SSH并设置免密码登录

    若未安装SSH服务,使用:apt-get install ssh 即可。

    使用hadoop用户生成公钥,私钥

    su hadoop
    
    cd ~
    
    ssh-keygen -t rsa -P ""
    
    一路回车即可...
    

    执行完后,将在 /home/hadoop/.ssh 文件加中生成 id_rsa(私钥),id_rsa.pub(公钥)

    将公钥添加到 authorized_keys(此文件用于保存允许以当前用户身份登录到ssh客户端用户的公钥内容)

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
    修改此文件权限
    chmod 600 ~/.ssh/authorized_keys
    
    修改配置文件
    sudo vim /etc/ssh/sshd_config, 取消下列注释:
    
    RSAAuthentication yes
    PubkeyAuthentication yes
    AuthorizedKeysFile  .ssh/authorized_keys
    
    重启SSH服务
    sudo service ssh restart
    
    使用haddop用户测试免密码登录localhost
    ssh hadoop@localhost (弹出确认信息,输入yes即可)
    

    Step4:安装SDK

    更新软件
    sudo apt-get update
    
    安装软件
    sudo apt-get install software-properties-common
    
    添加源
    sudo add-apt-repository ppa:webupd8team/java
    
    再次更新
    sudo apt-get update
    
    安装JDK
    sudo apt-get install oracle-java8-installer
    
    检查是否安装成功:
    java -version
    
    java version "1.8.0_121"
    Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
    
    注:若此步骤安装太慢,也可到 http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 下载自行安装。
    

    Step5:安装vm-slave1,vm-slave2

    重复以上操作

    Step6:配置允许vm-master免密码登录vm-slave1, vm-slave2

    在vm-master上操作

    su hadoop
    
    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@vm-slave1
    
    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@vm-slave2
    
    验证是否设置成功
    在vm-master上,切换至hadoop用户
    ssh hadoop@vm-slave1, 看是否可以免密码登录
    

    Step7:下载安装Hadoop

    wget http://statics.charlesjiang.com/hadoop-2.7.3.tar.gz
    
    tar zxvf hadoop-2.7.3.tar.gz
    
    sudo mv hadoop-2.7.3 /usr/local/hadoop
    
    将hadoop文件夹属主改为hadoop
    sudo chown -R hadoop:hadoop /usr/local/hadoop
    

    Step8:配置Hadoop

    涵盖配置文件:

    /usr/local/hadoop/etc/hadoop/slaves
    /usr/local/hadoop/etc/hadoop/core-site.xml
    /usr/local/hadoop/etc/hadoop/hdfs-site.xml
    /usr/local/hadoop/etc/hadoop/mapred-site.xml
    /usr/local/hadoop/etc/hadoop/yarn-site.xml
    /usr/local/hadoop/etc/hadoop/hadoop-env.sh
    
    • 修改core-site.xml
    <configuration>
            <property>
                    <name>fs.default.name</name>
                    <value>hdfs://vm-master:9000</value>
            </property>
    </configuration>
    注:此处value不能配置为localhost
    
    • 修改mapred-site.xml
    cp mapred-site.xml.template  ./mapred-site.xml
    
    vim mapred-site.xml
    
    <configuration>
            <property>
                    <name>fs.default.name</name>
                    <value>hdfs://vm-master:9000</value>
            </property>
            <property>
                    <name>mapred.job.tracker</name>
                    <value>hdfs://vm-master:9001</value>
            </property>
            <property>
                    <name>mapreduce.framework.name</name>
                    <value>yarn</value>
            </property>
    </configuration>
    
    • 修改 hdfs-site.xml
    <configuration>
            <property>
                    <name>dfs.name.dir</name>
                    <value>/usr/local/hadoop/namenode</value>
            </property>
            <property>
                    <name>dfs.data.dir</name>
                    <value>/usr/local/hadoop/datanode</value>
            </property>
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
            </property>
    </configuration>
    
    
    • 修改 yarn-site.xml
    <configuration>
            <property>
                    <name>yarn.resourcemanager.address</name>
                    <value>vm-master:8032</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.scheduler.address</name>
                    <value>vm-master:8030</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.webapp.address</name>
                    <value>vm-master:8088</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.resource-tracker.address</name>
                    <value>vm-master:8031</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.admin.address</name>
                    <value>vm-master:8033</value>
            </property>
            <property>
                    <name>yarn.nodemanager.aux-services</name>
                    <value>mapreduce_shuffle</value>
            </property>
            <property>
                    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
    </configuration>
    
    
    • 修改slaves
    vm-master
    vm-slave1
    vm-slave2
    
    • 修改hadoop-env.sh
    export JAVA_HOME=/usr/lib/jvm/java-8-oracle
    
    注:此处填写JDK绝对路径
    

    Step9:配置hadoop用户环境变量

    su hadoop
    vim /home/hadoop/.bash_profile
    
    具体配置如下:
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export YARN_HOME=${HADOOP_HOME}
    export HADOOP_YARN_HOME=${HADOOP_HOME}
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export SCALA_HOME=/usr/local/scala
    export SPARK_HOME=/usr/local/spark
    
    JAVA_HOME=/usr/lib/jvm/java-8-oracle
    JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
    CLASSPATH=.:$JAVA_HOME/lib/tools.jar
    
    PATH=$JAVA_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATH
    
    export JAVA_HOME CLASSPATH PATH USER LOGNAME MAIL HOSTNAME
    
    同时将vm-slave1, vm-slave2 做相同配置
    
    执行配置文件
    source /home/hadoop/.bash_profile
    

    Step10:将vm-master的Hadoop拷贝至 vm-slave1, vm-slave2

    scp -r /usr/local/hadoop/ hadoop@vm-slave1:/home/hadoop
    
    scp -r /usr/local/hadoop/ hadoop@vm-slave2:/home/hadoop
    
    分别在vm-slave1, vm-slave2中,将hadoop文件夹移至 /usr/local/hadoop
    
    sudo mv hadoop /usr/local/hadoop
    
    移动后再在vm-slave1, vm-slave2 将/usr/local/hadoop宿主改为hadoop
    
    sudo chown -R hadoop:hadoop  /usr/local/hadoop/
    

    Step11:格式化HDFS

    在vm-master上操作

    cd /usr/local/hadoop
    
    ./bin/hdfs namenode -format
    
    
    输出类似如下信息则格式成功:
    
    17/03/29 15:14:36 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    17/03/29 15:14:36 INFO util.ExitUtil: Exiting with status 0
    17/03/29 15:14:36 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at vm-master.localdomain/127.0.1.1
    ************************************************************/
    

    Step12:启动Hadoop

    在vm-master上操作

    sbin/start-dfs.sh
    启动后,输入jps
    
    类似如下信息则成功:
    2403 DataNode
    3188 Jps
    3079 SecondaryNameNode
    2269 NameNode
    
    注:弹出确认信息,输入yes即可
    

    Step12:启动Yarn

    在vm-master上操作

    sbin/start-yarn.sh 
    启动后,输入jps
    
    类似如下信息则成功:
    3667 Jps
    2403 DataNode
    3237 ResourceManager
    3079 SecondaryNameNode
    2269 NameNode
    3391 NodeManager
    
    再在 vm-sleve1 或 vm-sleve2中输入 jps
    显示类似如下信息:
    
    2777 Jps
    2505 DataNode
    2654 NodeManager
    

    Step13:验证安装

    1. 查看Hadoop状态
    bin/hdfs dfsadmin -report
    
    如下信息:
    
    Configured Capacity: 198576648192 (184.94 GB)
    Present Capacity: 180282531840 (167.90 GB)
    DFS Remaining: 180282449920 (167.90 GB)
    DFS Used: 81920 (80 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    
    -------------------------------------------------
    Live datanodes (3):
    
    Name: 10.211.55.24:50010 (vm-slave2)
    Hostname: vm-master
    Decommission Status : Normal
    Configured Capacity: 66192216064 (61.65 GB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 6213672960 (5.79 GB)
    DFS Remaining: 59978518528 (55.86 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 90.61%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Wed Mar 29 16:38:34 CST 2017
    
    
    Name: 10.211.55.25:50010 (vm-slave1)
    Hostname: vm-master
    Decommission Status : Normal
    Configured Capacity: 66192216064 (61.65 GB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 6213672960 (5.79 GB)
    DFS Remaining: 59978518528 (55.86 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 90.61%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Wed Mar 29 16:38:34 CST 2017
    
    
    Name: 10.211.55.23:50010 (vm-master)
    Hostname: vm-master
    Decommission Status : Normal
    Configured Capacity: 66192216064 (61.65 GB)
    DFS Used: 32768 (32 KB)
    Non DFS Used: 5866770432 (5.46 GB)
    DFS Remaining: 60325412864 (56.18 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 91.14%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Wed Mar 29 16:38:34 CST 2017
    
    1. 查看HDFS管理页面

    常见问题

    1. 系统乱码如下图:


      乱码

    解决办法:

    sudo vim /etc/environment
    添加
    LANG="en_US.UTF-8"
    LANGUAGE="en_US:en"
    
    sudo vim /var/lib/locales/supported.d/local
    添加:
    en_US.UTF-8 UTF-8
    
    sudo vim /etc/default/locale
    修改:
    LANG="en_US.UTF-8"
    LANGUAGE="en_US:en"
    
    重启
    sudo reboot
    
    1. 克隆虚拟机后网卡失效问题
      解决办法:
    vim /etc/udev/rules.d/70-persistent-net.rules
    
    删除含有eth0 的行,并将 eth1 改为eth0
    
    重启即可
    
    网卡

    3.未找到环境变量 JAVA_HOME

    vm-slave2: Error: JAVA_HOME is not set and could not be found.
    vm-slave1: Error: JAVA_HOME is not set and could not be found.
    

    解决办法:

    将所有服务器的 hadoop-env.sh 中的 export JAVA_HOME= 设置为JDK绝对路径
    
    1. 无法打开 http://vm-master:8088/
    检查 hosts文件,
    
    127.0.0.1       localhost
    #127.0.1.1      vm-master.localdomain   vm-master  【将此信息屏蔽】
    
    # The following lines are desirable for IPv6 capable hosts
    ::1     localhost ip6-localhost ip6-loopback
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    
    10.211.55.23  vm-master
    10.211.55.25  vm-slave1
    10.211.55.24  vm-slave2
    
    关闭yarn  sbin/stop-yarn.sh
    关闭hdfs  sbin/stop-dfs.sh
    启动hdfs  sbin/start-dfs.sh
    启动yarn  sbin/start-yarn.sh
    

    博客地址:http://www.charlesjiang.com/archives/45.html

    相关文章

      网友评论

        本文标题:Ubuntu搭建Hadoop

        本文链接:https://www.haomeiwen.com/subject/wwqyottx.html