美文网首页Linux科技
Hadoop2.x 伪分布式

Hadoop2.x 伪分布式

作者: Miracle001 | 来源:发表于2019-03-04 17:17 被阅读0次

    简介

    Big data:
      结构化数据:约束
      半结构化数据
      非结构化数据:没有元数据
        日志数据是非结构化数据
      搜索引擎:搜索组件、索引组件
        爬虫、蜘蛛程序--爬取数据是非结构化、半结构化
    
      分词器
    
      存储
      分析处理
      google的三篇论文:
        GFS  2003年  The Google File System
        MapReduce 2004年  Simplified Data Processing On Large Cluster
        BigTable  2006年 A Distributed Storage System for Structure Data
    
      山寨版:
      HDFS
      MapReduce
      HBase
      HDFS + MapReduce = Hadoop
    
    Nutch爬取数据-->loosen:数据越大,处理速度越慢;想解决方案--google的论文发布
    
    MapReduce是批处理程序,速度和性能差
    
    NAS,SAN  共享存储
    存储系统只有一个,io压力过大,不适用;  集中式传统的解决方案
    
    分布式存储
    有中心节点  有元数据存储  GFS/HDFS
    无中心节点
    
    
    NN:NameNode
    SNN:Secondary  第二节点,避免NN down掉后,重读数据文件,耗时过长
    数据持久化--事务日志-->image--磁盘镜像,保证元数据不丢失;
    Hadoop2.0后使用zookeeper高可用,元数据存放在共享存储NFS上。
    
    
    DN:DataNode
    数据副本,保证数据完整;
    heartbeat
    数据块列表:
      数据为中心,存在哪些节点上;
      节点为中心,有哪些数据块;
    Job Tracker  TaskTracker
    数据在哪--程序就在哪
    namenode 和 jobe tracker 在一起,容易造成系统瓶颈
    datanote 和 tasktracker  在一起
    
    函数式编程
      把一个函数当成另一个函数的参数
      Lisp,ML函数式编程语言:高级函数:
    map, fold
      map:
        map(f())
        map:接受一个函数为参数,并将其应用于列表中的所有元素,从而生产一个结果列表;
      fold:
        接受两个参数:函数,初始值
          fold(g(), init)
    
    mapreduce:
      mapper, reducer
      shuffle and sort  整理传输和排序
      k-v 数据
      同一个键只能发往同一个reducer
      可能会mapreduce多次
    mapper-->combiner-->partitioner--reduccer
    mapper 和 reducer  输入、输出的键不同
    combiner  输入、输出的键相同
    
    MRv1(hadoop1)-->MRv2(hadoop2)
    MRv1:Cluster resource manager, Data processing
    MRv2:
      YARN:Cluster resource manager
      MRv2:Data processing
        MR:batch 批处理
        Tez:execution engine  
    
        RM  resource manager
        NM  node ...
        AM  apply ...
        container  mr
        如下图1
    
    hadoop生态系统  如下图2
    sqoop  
      从其他关系型数据库中抽取数据导入到Hadoop中;
      将Hadoop中的数据抽取出来,结构化后,导入到关系型数据库中;
    Flume
      日志收集存储到Hadoop中
    hive
    pig
    HBase  列式存储
    数据序列化:把非流式化数据转化为流式化数据,而且还可以还原回来。
    storm  数据统计和分析
    
    Hadoop Distribution:
      Cloudera:CDH  
      Hortonworks:HDP
      商业版
      Intel:IDH
      MapR
    
    单机模型:测试用,程序是否可以应用到Hadoop里
    伪分布式模型:运行于单机
    分布式模型:集群模型
    
    Hadoop:基于Java语言
    
    
    
    1 2 3 4

    Hadoop 伪分布式

    centos 7 1804
    NAT 192.168.25.14
    仅主机  192.168.50.14
    禁用防火墙
    禁用selinux
    yum -y install wget vim lrzsz net-tools ntpdate
    yum -y install epel-release-latest-7.noarch.rpm
    cat /etc/hosts
    192.168.25.11 node1.fgq.com node1
    192.168.25.12 node2.fgq.com node2
    192.168.25.13 node3.fgq.com node3
    192.168.25.14 node4.fgq.com node4
    192.168.25.15 node5.fgq.com node5
    crontab -e
    */5 * * * * ntpdate time3.aliyun.com && hwclock -w
    
    [root@node4 ~]# mkdir -p /fgq/base-env/
    [root@node4 ~]# cd /fgq/base-env/
    下载jdk包  jdk-8u152-linux-x64.tar.gz
    下载Hadoop包  hadoop-2.9.2.tar.gz
    传到这个目录下,并解压
    [root@node4 base-env]# tar zxf jdk-8u152-linux-x64.tar.gz 
    [root@node4 base-env]# tar zxf hadoop-2.9.2.tar.gz
    [root@node4 base-env]# ln -s jdk1.8.0_152 jdk
    [root@node4 base-env]# ln -s hadoop-2.9.2 hadoop
    
    [root@node4 ~]# vim /etc/profile
    最下面添加如下信息:
    export JAVA_HOME=/fgq/base-env/jdk
    export JRE_HOME=$JAVA_HOME/jre
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
    [root@node4 ~]# source /etc/profile
    [root@node4 ~]# java -version
    java version "1.8.0_152"
    Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
    Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)
    
    [root@node4 ~]# vim /etc/profile.d/hadoop.sh
    export HADOOP_HOME=/fgq/base-env/hadoop
    export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
    export HADOOP_YARN_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_MAPPERD_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    [root@node4 ~]# source /etc/profile.d/hadoop.sh
    
    [root@node4 ~]# cd /fgq/base-env/hadoop
    [root@node4 hadoop]# groupadd hadoop
    [root@node4 hadoop]# useradd -g hadoop yarn
    [root@node4 hadoop]# useradd -g hadoop hdfs
    [root@node4 hadoop]# useradd -g hadoop mapred
    [root@node4 hadoop]# mkdir -p /fgq/data/hadoop/hdfs/{nn,snn,dn}
    [root@node4 hadoop]# chown -R hdfs:hadoop /fgq/data/hadoop/hdfs
    [root@node4 hadoop]# ll /fgq/data/hadoop/hdfs
    
    [root@node4 hadoop]# mkdir logs
    [root@node4 hadoop]# chmod g+w logs  确保logs用户组有写权限
    [root@node4 hadoop]# chown -R yarn:hadoop logs
    [root@node4 hadoop]# chown -R yarn:hadoop ./*
    [root@node4 hadoop]# ll
    
    [root@node4 hadoop]# cd etc/hadoop/
    [root@node4 hadoop]# vim core-site.xml
    <configuration>
            <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://node4:8020</value>
                    <final>true</final>
            </property>
    </configuration>
    [root@node4 hadoop]# vim hdfs-site.xml
    <configuration>
       <property>
                 <name>dfs.replication</name>
                 <value>1</value>
       </property>
       <property>
                 <name>dfs.namenode.name.dir</name>
                 <value>file:///fgq/data/hadoop/hdfs/nn</value>
       </property>
       <property>
                 <name>dfs.datanode.data.dir</name>
                 <value>file:///fgq/data/hadoop/hdfs/dn</value>
       </property>
       <property>
                 <name>fs.checkpoint.dir</name>
                 <value>file:///fgq/data/hadoop/hdfs/snn</value>
       </property>
       <property>
                 <name>fs.checkpoint.edits.dir</name>
                 <value>file:///fgq/data/hadoop/hdfs/snn</value>
       </property>
    </configuration>
    
    [root@node4 hadoop]# cp mapred-site.xml.template mapred-site.xml
    [root@node4 hadoop]# vim mapred-site.xml
    <configuration>
       <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
       </property>
    </configuration>
    [root@node4 hadoop]# vim yarn-site.xml
    <configuration>
       <property>
           <name>yarn.resourcemanager.address</name>
           <value>node4:8032</value>
       </property>
       <property>
           <name>yarn.resourcemanager.scheduler.address</name>
           <value>node4:8030</value>
       </property>
       <property>
           <name>yarn.resourcemanager.resource-tracker.address</name>
           <value>node4:8031</value>
       </property>
       <property>
           <name>yarn.resourcemanager.admin.address</name>
           <value>node4:8033</value>
       </property>
       <property>
           <name>yarn.resourcemanager.webapp.address</name>
           <value>node4:8088</value>
       </property>
       <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
       </property>
       <property>
           <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
       </property>
       <property>
           <name>yarn.resourcemanager.scheduler.class</name>
           <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
       </property>
    </configuration>
    [root@node4 hadoop]# vim slaves
    node4
    [root@node4 hadoop]# su - hdfs
    
    ## 格式化
    [hdfs@node4 ~]$ hdfs namenode -format
    19/03/02 10:45:03 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = node4.fgq.com/192.168.25.14
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 2.9.2
    ...  ...
    19/03/02 10:45:05 INFO common.Storage: Storage directory /fgq/data/hadoop/hdfs/nn has been successfully formatted.
    19/03/02 10:45:05 INFO namenode.FSImageFormatProtobuf: Saving image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
    19/03/02 10:45:05 INFO namenode.FSImageFormatProtobuf: Image file /fgq/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds .
    19/03/02 10:45:05 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    19/03/02 10:45:05 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at node4.fgq.com/192.168.25.14
    ************************************************************/
    显示有 successfully 字样,表示OK
    [hdfs@node4 ~]$ ls /fgq/data/hadoop/hdfs/nn/current/
    fsimage_0000000000000000000  fsimage_0000000000000000000.md5  seen_txid  VERSION
    
    ## 启动namenode
    [hdfs@node4 ~]$ hadoop-daemon.sh start namenode
    starting namenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-namenode-node4.fgq.com.out
    [hdfs@node4 ~]$ less /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-namenode-node4.fgq.com.log 
    [hdfs@node4 ~]$ jps  #java的ps命令查看进程
    1769 NameNode
    1851 Jps
    [hdfs@node4 ~]$ jps -h
    illegal argument: -h
    usage: jps [-help]
           jps [-q] [-mlvV] [<hostid>]
    
    Definitions:
        <hostid>:      <hostname>[:<port>]
    [hdfs@node4 ~]$ jps -v
    1879 Jps -Denv.class.path=.:/fgq/base-env/jdk/lib/dt.jar:/fgq/base-env/jdk/lib/tools.jar -Dapplication.home=/fgq/base-env/jdk1.8.0_152 -Xms8m
    1769 NameNode -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/fgq/base-env/hadoop-2.9.2/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/fgq/base-env/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=/fgq/base-env/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/fgq/base-env/hadoop-2.9.2/logs -Dhadoop.log.file=hadoop-hdfs-namenode-node4.fgq.com.log -Dhadoop.home.dir=/fgq/base-env/hadoop-2.9.2 -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/fgq/base-env/hadoop-2.9.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS
    
    ## 启动secondarynamenode
    [hdfs@node4 ~]$ hadoop-daemon.sh start secondarynamenode
    starting secondarynamenode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-secondarynamenode-node4.fgq.com.out
    [hdfs@node4 ~]$ jps
    1990 Jps
    1769 NameNode
    1945 SecondaryNameNode
    
    ## 启动datanode
    [hdfs@node4 ~]$ hadoop-daemon.sh start datanode
    starting datanode, logging to /fgq/base-env/hadoop-2.9.2/logs/hadoop-hdfs-datanode-node4.fgq.com.out
    名称节点一般不作为数据节点,但此处是伪分布式
    [hdfs@node4 ~]$ jps
    1769 NameNode
    1945 SecondaryNameNode
    2073 DataNode
    2155 Jps
    
    [hdfs@node4 ~]$ hdfs dfs -ls /  #根路径下没有目录,创建一个目录
    [hdfs@node4 ~]$ hdfs dfs -mkdir /test
    [hdfs@node4 ~]$ hdfs dfs -ls /
    Found 1 items
    drwxr-xr-x   - hdfs supergroup          0 2019-03-02 11:29 /test
    注意属主和属组
    注意:如果需要其他用户对hdfs有写入权限,需要在hdfs-site.xml文件中添加一项属性定义:
       <property>
             <name>dfs.permissions</name>
             <value>false</value>
       </property>
    
    ##上传文件
    [hdfs@node4 ~]$ hdfs dfs -put /etc/fstab /test/fstab
    [hdfs@node4 ~]$ hdfs dfs -lsr /
    lsr: DEPRECATED: Please use 'ls -R' instead.
    drwxr-xr-x   - hdfs supergroup          0 2019-03-02 11:37 /test
    -rw-r--r--   1 hdfs supergroup        501 2019-03-02 11:37 /test/fstab
    /test/fatab这个文件是在远程的hdfs上的
    本地文件系统位置:
    [root@node4 ~]# vim /fgq/data/hadoop/hdfs/dn/current/BP-1435152656-192.168.25.14-1551494705143/current/finalized/subdir0/subdir0/blk_1073741825
    文件过大,分片时,本地文件系统路径也可以查看访问,但是可能会存放于不同目录下
    dfs访问接口查看:
    [hdfs@node4 ~]$ hdfs dfs -cat /test/fstab
    
    #
    # /etc/fstab
    # Created by anaconda on Thu Feb 28 17:13:02 2019
    #
    # Accessible filesystems, by reference, are maintained under '/dev/disk'
    # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
    #
    UUID=aebd58c2-fdc1-44ad-b33a-9b6efdf9488c /                       xfs     defaults        0 0
    UUID=2fe4fb6b-aab1-42f7-b024-45be2e7065f5 /boot                   xfs     defaults        0 0
    UUID=79f97775-e6bb-494d-827a-1f5aa3423c6d swap                    swap    defaults        0 0
    
    [hdfs@node4 ~]$ exit
    logout
    
    ## 切换至yarn用户,启动yarn服务
    [root@node4 hadoop]# su - yarn
    [yarn@node4 ~]$ yarn-daemon.sh start resourcemanager
    starting resourcemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-resourcemanager-node4.fgq.com.out
    [yarn@node4 ~]$ jps
    3376 Jps
    3141 ResourceManager
    [yarn@node4 ~]$ yarn-daemon.sh start nodemanager
    starting nodemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-nodemanager-node4.fgq.com.out
    [yarn@node4 ~]$ jps
    3141 ResourceManager
    3525 Jps
    3417 NodeManager
    

    Web UI接口浏览

    HDFS 和 YARN ResourceManager 各自提供了一个web接口
    通过这些接口可以查看HDFS 集群以及YARN集群的相关状态信息
    HDFS-NameNode  http://192.168.25.14:50070  如下图1
    YARN-ResourceManager  http://192.168.25.14:8088  如下图2
    注意:yarn-site.xml文件中 yarn.resourcemanager.webapp.address 属性的值如果定义为"localhost:8088",则其WebUI仅监听于127.0.0.1地址上的8088端口。
    
    1 2 2

    Hadoop运行程序

    [root@node4 ~]# cd /fgq/base-env/hadoop/share/hadoop/mapreduce/
    [root@node4 mapreduce]# ls
    Hadoop-YARN自带了许多样例程序,其中的 hadoop-mapreduce-examples-2.9.2.jar 可用作 mapreduce程序,供测试用
    
    注意:要切换用户至hdfs
    [root@node4 ~]# su - hdfs
    [hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar
    需要指定参数--项目名称,如下:
    An example program must be given as the first argument.
    Valid program names are:
      aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
      aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
      bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
      dbcount: An example job that count the pageview counts from a database.
      distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
      grep: A map/reduce program that counts the matches of a regex in the input.
      join: A job that effects a join over sorted, equally partitioned datasets
      multifilewc: A job that counts words from several files.
      pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
      pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
      randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
      randomwriter: A map/reduce program that writes 10GB of random data per node.
      secondarysort: An example defining a secondary sort to the reduce.
      sort: A map/reduce program that sorts the data written by the random writer.
      sudoku: A sudoku solver.
      teragen: Generate data for the terasort
      terasort: Run the terasort
      teravalidate: Checking results of terasort
      wordcount: A map/reduce program that counts the words in the input files.
      wordmean: A map/reduce program that counts the average length of the words in the input files.
      wordmedian: A map/reduce program that counts the median length of the words in the input files.
      wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
    
    [hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
    19/03/02 15:49:46 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
    19/03/02 15:49:47 INFO input.FileInputFormat: Total input files to process : 1
    19/03/02 15:49:48 INFO mapreduce.JobSubmitter: number of splits:1
    19/03/02 15:49:48 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
    19/03/02 15:49:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551512925744_0001
    19/03/02 15:49:49 INFO impl.YarnClientImpl: Submitted application application_1551512925744_0001
    19/03/02 15:49:49 INFO mapreduce.Job: The url to track the job: http://node4:8088/proxy/application_1551512925744_0001/
    19/03/02 15:49:49 INFO mapreduce.Job: Running job: job_1551512925744_0001
    19/03/02 15:49:57 INFO mapreduce.Job: Job job_1551512925744_0001 running in uber mode : false
    19/03/02 15:49:57 INFO mapreduce.Job:  map 0% reduce 0%
    19/03/02 15:50:02 INFO mapreduce.Job:  map 100% reduce 0%
    19/03/02 15:50:06 INFO mapreduce.Job:  map 100% reduce 100%
    19/03/02 15:50:07 INFO mapreduce.Job: Job job_1551512925744_0001 completed successfully
    19/03/02 15:50:07 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=591
            FILE: Number of bytes written=397951
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=594
            HDFS: Number of bytes written=433
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=2693
            Total time spent by all reduces in occupied slots (ms)=2021
            Total time spent by all map tasks (ms)=2693
            Total time spent by all reduce tasks (ms)=2021
            Total vcore-milliseconds taken by all map tasks=2693
            Total vcore-milliseconds taken by all reduce tasks=2021
            Total megabyte-milliseconds taken by all map tasks=2757632
            Total megabyte-milliseconds taken by all reduce tasks=2069504
        Map-Reduce Framework
            Map input records=11
            Map output records=54
            Map output bytes=625
            Map output materialized bytes=591
            Input split bytes=93
            Combine input records=54
            Combine output records=38
            Reduce input groups=38
            Reduce shuffle bytes=591
            Reduce input records=38
            Reduce output records=38
            Spilled Records=76
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=193
            CPU time spent (ms)=1250
            Physical memory (bytes) snapshot=462479360
            Virtual memory (bytes) snapshot=4232617984
            Total committed heap usage (bytes)=292028416
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=501
        File Output Format Counters 
            Bytes Written=433
    
    [hdfs@node4 ~]$ hdfs dfs -lsr /test
    lsr: DEPRECATED: Please use 'ls -R' instead.
    -rw-r--r--   1 hdfs supergroup        501 2019-03-02 11:37 /test/fstab
    drwxr-xr-x   - hdfs supergroup          0 2019-03-02 15:50 /test/fstab_out
    -rw-r--r--   1 hdfs supergroup          0 2019-03-02 15:50 /test/fstab_out/_SUCCESS
    -rw-r--r--   1 hdfs supergroup        433 2019-03-02 15:50 /test/fstab_out/part-r-00000
    生成一个 /etc/fstab_out 文件目录,目录下生成两个文件 _SUCCESS 和 part-r-00000 ,表示OK
    
    ## 查看统计字母的次数
    [hdfs@node4 ~]$ hdfs dfs -cat /test/fstab_out/part-r-00000
    #   7
    '/dev/disk' 1
    /   1
    /boot   1
    /etc/fstab  1
    0   6
    17:13:02    1
    2019    1
    28  1
    Accessible  1
    Created 1
    Feb 1
    See 1
    Thu 1
    UUID=2fe4fb6b-aab1-42f7-b024-45be2e7065f5   1
    UUID=79f97775-e6bb-494d-827a-1f5aa3423c6d   1
    UUID=aebd58c2-fdc1-44ad-b33a-9b6efdf9488c   1
    anaconda    1
    and/or  1
    are 1
    blkid(8)    1
    by  2
    defaults    3
    filesystems,    1
    findfs(8),  1
    for 1
    fstab(5),   1
    info    1
    maintained  1
    man 1
    more    1
    mount(8)    1
    on  1
    pages   1
    reference,  1
    swap    2
    under   1
    xfs 2
    
    报错
    运行jar程序时,报错
    [hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
    19/03/02 15:16:11 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
    19/03/02 15:16:13 INFO ipc.Client: Retrying connect to server: node4/192.168.25.14:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    报错:一直尝试链接,原因是 yarn 服务的 resourcemanager 没有启动
    
    ##### 解决
    [root@node4 ~]# su - yarn
    [yarn@node4 ~]$ jps
    5440 Jps
    3417 NodeManager
    [yarn@node4 ~]$ yarn-daemon.sh start resourcemanager
    starting resourcemanager, logging to /fgq/base-env/hadoop/logs/yarn-yarn-resourcemanager-node4.fgq.com.out
    [yarn@node4 ~]$ jps
    5478 ResourceManager
    3417 NodeManager
    5711 Jps
    
    
    ## 再次启动mapreduce程序,仍然报错
    [hdfs@node4 ~]$ yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
    19/03/02 15:29:10 INFO client.RMProxy: Connecting to ResourceManager at node4/192.168.25.14:8032
    19/03/02 15:29:11 INFO input.FileInputFormat: Total input files to process : 1
    19/03/02 15:29:11 INFO mapreduce.JobSubmitter: number of splits:1
    19/03/02 15:29:11 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
    19/03/02 15:29:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551511594876_0001
    19/03/02 15:29:12 INFO impl.YarnClientImpl: Submitted application application_1551511594876_0001
    19/03/02 15:29:12 INFO mapreduce.Job: The url to track the job: http://node4:8088/proxy/application_1551511594876_0001/
    19/03/02 15:29:12 INFO mapreduce.Job: Running job: job_1551511594876_0001
    19/03/02 15:29:17 INFO mapreduce.Job: Job job_1551511594876_0001 running in uber mode : false
    19/03/02 15:29:17 INFO mapreduce.Job:  map 0% reduce 0%
    19/03/02 15:29:17 INFO mapreduce.Job: Job job_1551511594876_0001 failed with state FAILED due to: Application application_1551511594876_0001 failed 2 times due to AM Container for appattempt_1551511594876_0001_000002 exited with  exitCode: 1
    Failing this attempt.Diagnostics: [2019-03-02 15:29:17.428]Exception from container-launch.
    Container id: container_1551511594876_0001_02_000001
    Exit code: 1
    
    [2019-03-02 15:29:17.429]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
    Last 4096 bytes of prelaunch.err :
    Last 4096 bytes of stderr :
    Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ed300000, 35651584, 0) failed; error='Cannot allocate memory' (errno=12)
    
    
    [2019-03-02 15:29:17.429]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
    Last 4096 bytes of prelaunch.err :
    Last 4096 bytes of stderr :
    Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000ed300000, 35651584, 0) failed; error='Cannot allocate memory' (errno=12)
    
    
    For more detailed output, check the application tracking page: http://node4:8088/cluster/app/application_1551511594876_0001 Then click on links to logs of each attempt.
    . Failing the application.
    19/03/02 15:29:17 INFO mapreduce.Job: Counters: 0
    
    显示内存不足,关机,增大内存到3G,再开机启动服务
    
    启动
    su - hdfs
    hdfs namenode -format  第一次需要执行
    hadoop-daemon.sh start namenode
    hadoop-daemon.sh start secondarynamenode
    hadoop-daemon.sh start datanode
    jps
    1769 NameNode
    1945 SecondaryNameNode
    2073 DataNode
    2155 Jps
    hdfs dfs -mkdir /test  第一次需要执行
    hdfs dfs -put /etc/fstab /test/fstab  第一次需要执行
    hdfs dfs -lsr /  第一次需要执行
    
    
    su - yarn
    yarn-daemon.sh start resourcemanager
    yarn-daemon.sh start nodemanager
    jps
    3141 ResourceManager
    3525 Jps
    3417 NodeManager
    
    su - hdfs
    yarn jar /fgq/base-env/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /test/fstab /test/fstab_out
    
    hdfs dfs -lsr /test
    hdfs dfs -cat /test/fstab_out/part-r-00000
    

    相关文章

      网友评论

        本文标题:Hadoop2.x 伪分布式

        本文链接:https://www.haomeiwen.com/subject/lpekuqtx.html