美文网首页
hadoop - 配置完全分布式模式并运行WordCount程序

hadoop - 配置完全分布式模式并运行WordCount程序

作者: 静水流深ylyang | 来源:发表于2019-05-08 11:36 被阅读0次
      1. Hadoop三种模式便捷切换以及让命令行提示符显式完整路径
        (1)Hadoop三种模式便捷切换
        首先执行如下命令:
        [yylin@big etc]$ cp -r hadoop local
        [yylin@big etc]$ cp -r hadoop pseudo
        [yylin@big etc]$ cp -r hadoop full
        [yylin@big etc]$ rm -rf hadoop
        [yylin@big etc]$ ln -s pseudo hadoop
      

    上面几条命令的结果是,将原有的hadoop文件夹复制三份,分别命名为local、pseudo和full,代表本地模式、伪分布式模式以及完全分布式模式,然后删除原有的hadoop文件夹,再创建hadoop符号链接指向pseudo文件夹。
    这样做的好处是,如果想在这三种模式中切换的话,只需要修改hadoop符号链接指向不同的文件夹就好了。

    (2)让命令行提示符显式完整路径
    (a)编辑/etc目录下的profile文件,添加环境变量PS1

      export PS1='[\u@\h `pwd`]\$'
    

    (b)生效,执行命令

      source /etc/profile
    
      1. 配置Hadoop伪分布式模式

    (1)进入${HADOOP_HOME}/etc/hadoop目录,在该目录下有好多文件,主要需要配置的是四个文件,分别是core-site.xml、hdfs-site.xml、mapred-site.xml和yarn-site.xml。



    从图中看到,没有mapred-site.xml,有个mapred-site.xml.template,复制并重命名为mapred-site.xml即可。
    (2)编辑core-site.xml:

      <?xml version="1.0"?>
      <configuration>
        <property>
          <name>fs.defaultFS</name>
          <value>hdfs://s135/</value>
        </property>
      </configuration>
    

    (3)编辑hdfs-site.xml,可以看到副本数为3:

      <?xml version="1.0"?>
      <configuration>
        <property>
          <name>dfs.replication</name>
          <value>3</value>
        </property>
        <property>
          <!-- 配置hadoop.tmp.dir目录 -->
          <name>hadoop.tmp.dir</name>
          <value>/home/yylin/hadoop</value>
        </property>
      </configuration>
    

    (4)编辑mapred-site.xml:
    注意:cp mapred-site.xml.template mapred-site.xml

      <?xml version="1.0"?>
       <configuration>
          <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
        <property>
          <name>yarn.app.mapreduce.am.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.map.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.reduce.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
      </configuration>
    

    (5)编辑yarn-site.xml:

      <?xml version="1.0"?>
      <configuration>
        <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>s135</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
      </configuration>
    
      1. 配置SSH

    (1)检查是否安装了ssh相关软件包(openssh-server + openssh-clients + openssh)

      [yylin@big hadoop]$ yum list installed | grep ssh
    

    (2)检查是否启动了sshd进程

      [yylin@big hadoop]$ ps -Af | grep sshd
    

    (3)在client侧生成公私秘钥对。

      [yylin@big ~]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
    

    (4)生成~/.ssh文件夹,里面有id_rsa(私钥) + id_rsa.pub(公钥)


    (5)追加公钥到~/.ssh/authorized_keys文件中(文件名、位置固定)

      [yylin@big ~]$ cd ~/.ssh
      [yylin@big .ssh]$ cat id_rsa.pub >> authorized_keys
    

    (6)修改authorized_keys的权限为644.

      [yylin@big .ssh]$ chmod 644 authorized_keys
    

    (7)测试

      [yylin@big .ssh]$ ssh localhost
    

    (8)格式化名称节点
    运行命令:hdfs namenode -format

    到此为止,完全分布模式配置成功。

    (9)启动Hadoop服务
    执行命令:

    start-dfs.sh
    

    启动namenodes、datanodes、secondary namenodes

    start-yarn.sh
    

    启动resourcemanager、nodemanagers
    (10)浏览器访问namenode节点
    http://s135:9870
    可以看到hadoop的信息

      1. 运行WordCount入门程序

    (1)WordCount程序介绍
    WordCount是hadoop学习的helloword级别的练习程序,可以统计英文单词的数目,在hadoop中有此示例程序,代码如下:

    package org.apache.hadoop.examples;
    
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    
    public class WordCount {
    
      public static class TokenizerMapper 
           extends Mapper<Object, Text, Text, IntWritable>{
        
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
          
        public void map(Object key, Text value, Context context
                        ) throws IOException, InterruptedException {
          StringTokenizer itr = new StringTokenizer(value.toString());
          while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
          }
        }
      }
      
      public static class IntSumReducer 
           extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();
    
        public void reduce(Text key, Iterable<IntWritable> values, 
                           Context context
                           ) throws IOException, InterruptedException {
          int sum = 0;
          for (IntWritable val : values) {
            sum += val.get();
          }
          result.set(sum);
          context.write(key, result);
        }
      }
    
      public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length < 2) {
          System.err.println("Usage: wordcount <in> [<in>...] <out>");
          System.exit(2);
        }
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        for (int i = 0; i < otherArgs.length - 1; ++i) {
          FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job,
          new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
    }
    

    将代码拷入工程中。
    接下来用Maven构建工程,用idea导出jar包。
    idea的pom.xml文件

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>nwpu</groupId>
        <artifactId>WordCount</artifactId>
        <version>1.0-SNAPSHOT</version>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>3.2.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>3.2.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>3.2.0</version>
            </dependency>
            <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <version>4.11</version>
            </dependency>
        </dependencies>
        <build>
            <plugins>
                <plugin>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <configuration>
                        <appendAssemblyId>false</appendAssemblyId>
                        <descriptorRefs>
                            <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                        <archive>
                            <manifest>
                                <!--注意,此处必须是main()方法对应类的完整路径  -->
                                <mainClass>org.apache.hadoop.examples.WordCountAPP</mainClass>
                            </manifest>
                        </archive>
                    </configuration>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <phase>package</phase>
                            <goals>
                                <goal>assembly</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    </project>
    

    然后导出jar包



    依次点击idea右侧栏的maven->WordCount->Lifecycle->(clean compile package install),即可生成jar包,目录如下:



    jar包导入服务器上,hello.txt文件put到hdfs文件系统上:
    hdfs dfs -put input /user/yylin/
    

    运行命令

    hadoop jar WordCount-1.0-SNAPSHOT.jar /user/yylin/input /user/yylin/output
    


    成功了。

    相关文章

      网友评论

          本文标题:hadoop - 配置完全分布式模式并运行WordCount程序

          本文链接:https://www.haomeiwen.com/subject/rtvyoqtx.html