美文网首页
IDEA运行MapReduce的wordcount案例遇到的几个

IDEA运行MapReduce的wordcount案例遇到的几个

作者: 白面葫芦娃92 | 来源:发表于2019-02-23 17:11 被阅读0次

    1、仿照官网的实例,改造为Scala代码

    import java.util.StringTokenizer
    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.Path
    import org.apache.hadoop.io.{IntWritable, Text}
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
    import org.apache.hadoop.mapreduce.{Job, Mapper, Reducer}
    
    object WordCount {
      def main(args: Array[String]): Unit = {
        val conf = new Configuration()
        val job = Job.getInstance(conf,"word count")
        job.setJarByClass(WordCount.getClass)
        job.setMapperClass(classOf[TokenizerMapper])
        job.setCombinerClass(classOf[IntSumReducer])
        job.setReducerClass(classOf[IntSumReducer])
        job.setOutputKeyClass(classOf[Text])
        job.setOutputValueClass(classOf[IntWritable])
        FileInputFormat.addInputPath(job,new Path("/data/ruozeinput.txt"))
        FileOutputFormat.setOutputPath(job, new Path("/out/WCoutput"))
        System.exit(
          if (job.waitForCompletion(true)) 0 else 1
        )
      }
    
      class TokenizerMapper extends Mapper [Object,Text,Text,IntWritable] {
        val one = new IntWritable(1)
        val word = new Text()
        def map(key:Object,value:Text,context:Context): Unit = {
          val itr = new StringTokenizer(value.toString)
          while (itr.hasMoreTokens) {
            word.set(itr.nextToken())
            context.write(word,one)
          }
        }
      }
    
      class IntSumReducer extends Reducer[Text,IntWritable,Text,IntWritable] {
        val result = new IntWritable()
        def reduce (key:Text,values:Iterable[IntWritable],context:Context): Unit = {
          var sum = 0
          for (valu <- values) {
            sum += valu.get()
          }
          result.set(sum)
          context.write(key,result)
        }
      }
    }
    

    2、确认虚拟机已启动hdfs和yarn,IDEA的Resources文件夹中有core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml这几个配置文件,运行代码,发现报错ResourceManager连接不上

    19/01/20 17:57:33 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    19/01/20 17:57:36 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    19/01/20 17:57:38 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    19/01/20 17:57:40 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    .......
    

    3、尝试修改yarn-site.xml配置
    原配置为:

    <configuration>
     <property>
             <name>yarn.nodemanager.aux-services</name>
             <value>mapreduce_shuffle</value>
    </property>
    </configuration>
    

    现修改为:

    <configuration>
     <property>
             <name>yarn.nodemanager.aux-services</name>
             <value>mapreduce_shuffle</value>
    </property>
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop001</value>
    </property>                  
    <property>
            <name>yarn.resourcemanager.address</name>
            <value>hadoop001:8032</value>
    </property>
    <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>hadoop001:8030</value>
    </property>
    <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>hadoop001:8031</value>
    </property>
    </configuration>
    

    4、尝试运行代码,发现依然连接不上,此时再jps,发现yarn挂掉了,重启yarn之后,第一次jps查看时有resourcemanager进程,稍过一会再jps发现resourcemanager又挂了
    进入$HADOOP_HOME/logs目录查看该进程日志:

    /************************************************************
    STARTUP_MSG: Starting ResourceManager
    STARTUP_MSG:   host = hadoop001/192.168.137.141
    STARTUP_MSG:   args = []
    STARTUP_MSG:   version = 2.6.0-cdh5.7.0
    STARTUP_MSG:   classpath = ......skipping...
    2019-01-20 17:08:22,641 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Update thread interrupted. Exiting.
    2019-01-20 17:08:22,658 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
    2019-01-20 17:08:22,649 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
    2019-01-20 17:08:22,650 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
    2019-01-20 17:08:22,650 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
    2019-01-20 17:08:22,674 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
    2019-01-20 17:08:22,676 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
    org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
            at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:278)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:990)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1090)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1222)
    Caused by: java.net.BindException: Port in use: 0.0.0.0:8088
            at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:951)
            at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:887)
            at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:273)
            ... 4 more
    Caused by: java.net.BindException: Address already in use
            at sun.nio.ch.Net.bind0(Native Method)
            at sun.nio.ch.Net.bind(Net.java:437)
            at sun.nio.ch.Net.bind(Net.java:429)
            at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
            at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
            at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
            at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:946)
            ... 6 more
    2019-01-20 17:08:22,696 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down ResourceManager at hadoop001/192.168.137.141
    ************************************************************/
    

    参考文章:https://blog.csdn.net/caiandyong/article/details/50913268
    发觉虚拟机内的yarn-site.xml文件依然是老配置,并没有修改,于是将虚拟机内yarn-site.xml的配置改为和IDEA的Resources文件夹中的yarn-site.xml一致,重启yarn,resourcemanager不再自动关闭了
    5、再次尝试运行代码,resourcemanager连接成功,但是job失败了

    19/01/20 18:11:17 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
    19/01/20 18:11:22 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/01/20 18:11:22 WARN JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
    19/01/20 18:11:22 INFO FileInputFormat: Total input paths to process : 1
    19/01/20 18:11:23 INFO JobSubmitter: number of splits:1
    19/01/20 18:11:24 INFO JobSubmitter: Submitting tokens for job: job_1547976418140_0004
    19/01/20 18:11:24 INFO YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
    19/01/20 18:11:24 INFO YarnClientImpl: Submitted application application_1547976418140_0004
    19/01/20 18:11:24 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547976418140_0004/
    19/01/20 18:11:24 INFO Job: Running job: job_1547976418140_0004
    19/01/20 18:11:29 INFO Job: Job job_1547976418140_0004 running in uber mode : false
    19/01/20 18:11:29 INFO Job:  map 0% reduce 0%
    19/01/20 18:11:29 INFO Job: Job job_1547976418140_0004 failed with state FAILED due to: Application application_1547976418140_0004 failed 2 times due to AM Container for appattempt_1547976418140_0004_000002 exited with  exitCode: 1
    For more detailed output, check application tracking page:http://hadoop001:8088/proxy/application_1547976418140_0004/Then, click on links to logs of each attempt.
    Diagnostics: Exception from container-launch.
    Container id: container_1547976418140_0004_02_000001
    Exit code: 1
    Exception message: /bin/bash: line 0: fg: no job control
    
    Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
    
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
        at org.apache.hadoop.util.Shell.run(Shell.java:478)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    
    
    Container exited with a non-zero exit code 1
    Failing this attempt. Failing the application.
    19/01/20 18:11:29 INFO Job: Counters: 0
    

    加conf.set("mapreduce.app-submission.cross-platform", "true"),再次运行

    19/01/20 23:45:30 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
    19/01/20 23:45:31 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/01/20 23:45:31 WARN JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
    19/01/20 23:45:31 INFO FileInputFormat: Total input paths to process : 1
    19/01/20 23:45:31 INFO JobSubmitter: number of splits:1
    19/01/20 23:45:31 INFO JobSubmitter: Submitting tokens for job: job_1547989123887_0006
    19/01/20 23:45:32 INFO YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
    19/01/20 23:45:32 INFO YarnClientImpl: Submitted application application_1547989123887_0006
    19/01/20 23:45:32 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547989123887_0006/
    19/01/20 23:45:32 INFO Job: Running job: job_1547989123887_0006
    19/01/20 23:45:56 INFO Job: Job job_1547989123887_0006 running in uber mode : false
    19/01/20 23:45:56 INFO Job:  map 0% reduce 0%
    19/01/20 23:46:10 INFO Job: Task Id : attempt_1547989123887_0006_m_000000_0, Status : FAILED
    Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    Caused by: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        ... 8 more
    
    19/01/20 23:46:22 INFO Job: Task Id : attempt_1547989123887_0006_m_000000_1, Status : FAILED
    Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:196)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    Caused by: java.lang.ClassNotFoundException: Class com.ruozedata.MapReduce.WordCountjava$TokenizerMapper not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        ... 8 more
    
    
    错误提示找不到WordCountjava$TokenizerMapper这个类,于是,把这个类打成jar包,加入依赖中
    19/01/20 21:05:04 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
    19/01/20 21:05:05 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/01/20 21:05:07 INFO FileInputFormat: Total input paths to process : 1
    19/01/20 21:05:07 INFO JobSubmitter: number of splits:1
    19/01/20 21:05:08 INFO JobSubmitter: Submitting tokens for job: job_1547989123887_0001
    19/01/20 21:05:09 INFO YarnClientImpl: Submitted application application_1547989123887_0001
    19/01/20 21:05:09 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1547989123887_0001/
    19/01/20 21:05:09 INFO Job: Running job: job_1547989123887_0001
    19/01/20 21:05:40 INFO Job: Job job_1547989123887_0001 running in uber mode : false
    19/01/20 21:05:40 INFO Job:  map 0% reduce 0%
    19/01/20 21:05:57 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_0, Status : FAILED
    Error: java.lang.ClassNotFoundException: scala.Function1
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
        at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
        at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
        at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    
    19/01/20 21:06:08 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_1, Status : FAILED
    Error: java.lang.ClassNotFoundException: scala.Function1
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
        at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
        at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
        at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    
    19/01/20 21:06:26 INFO Job: Task Id : attempt_1547989123887_0001_m_000000_2, Status : FAILED
    Error: java.lang.ClassNotFoundException: scala.Function1
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2138)
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2103)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getCombinerClass(JobContextImpl.java:208)
        at org.apache.hadoop.mapred.Task$CombinerRunner.create(Task.java:1585)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1033)
        at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
        at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
        at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    
    19/01/20 21:07:03 INFO Job:  map 100% reduce 100%
    19/01/20 21:07:04 INFO Job: Job job_1547989123887_0001 failed with state FAILED due to: Task failed task_1547989123887_0001_m_000000
    Job failed as tasks failed. failedMaps:1 failedReduces:0
    
    19/01/20 21:07:04 INFO Job: Counters: 12
        Job Counters 
            Failed map tasks=4
            Launched map tasks=4
            Other local map tasks=3
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=73744
            Total time spent by all reduces in occupied slots (ms)=0
            Total time spent by all map tasks (ms)=73744
            Total vcore-seconds taken by all map tasks=73744
            Total megabyte-seconds taken by all map tasks=75513856
        Map-Reduce Framework
            CPU time spent (ms)=0
            Physical memory (bytes) snapshot=0
            Virtual memory (bytes) snapshot=0
    
    Process finished with exit code 1
    

    根据此错误,推测应该是scala中有些函数无法运行(此问题暂时不知如何解决=_=!),下面用官网实例的java代码执行一下

    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class WordCountjava {
    
        public static class TokenizerMapper
                extends Mapper<Object, Text, Text, IntWritable>{
    
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();
    
            public void map(Object key, Text value, Context context
            ) throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
        }
    
        public static class IntSumReducer
                extends Reducer<Text,IntWritable,Text,IntWritable> {
            private IntWritable result = new IntWritable();
    
            public void reduce(Text key, Iterable<IntWritable> values,
                               Context context
            ) throws IOException, InterruptedException {
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, result);
            }
        }
    
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            conf.set("mapreduce.app-submission.cross-platform", "true");
            Job job = Job.getInstance(conf, "word count java");
            job.setJarByClass(WordCountjava.class);
    //        job.setJar("hdfs://hadoop001:9000/lib/hadoop-mapreduce.tar.gz");
            job.setMapperClass(TokenizerMapper.class);
            job.setCombinerClass(IntSumReducer.class);
            job.setReducerClass(IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
    //        FileInputFormat.addInputPath(job, new Path(args[0]));
    //        FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
            FileInputFormat.addInputPath(job, new Path("hdfs://hadoop001:9000/data/input"));
            FileOutputFormat.setOutputPath(job, new Path("hdfs://hadoop001:9000/out/WCoutput"));
    
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    

    执行此代码,运行成功

    19/01/25 21:07:25 INFO RMProxy: Connecting to ResourceManager at hadoop001/192.168.137.141:8032
    19/01/25 21:07:26 WARN JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/01/25 21:07:27 INFO FileInputFormat: Total input paths to process : 1
    19/01/25 21:07:27 INFO JobSubmitter: number of splits:1
    19/01/25 21:07:27 INFO JobSubmitter: Submitting tokens for job: job_1548417790455_0008
    19/01/25 21:07:27 INFO YarnClientImpl: Submitted application application_1548417790455_0008
    19/01/25 21:07:27 INFO Job: The url to track the job: http://hadoop001:8088/proxy/application_1548417790455_0008/
    19/01/25 21:07:27 INFO Job: Running job: job_1548417790455_0008
    19/01/25 21:07:49 INFO Job: Job job_1548417790455_0008 running in uber mode : false
    19/01/25 21:07:49 INFO Job:  map 0% reduce 0%
    19/01/25 21:08:05 INFO Job:  map 100% reduce 0%
    19/01/25 21:08:21 INFO Job:  map 100% reduce 100%
    19/01/25 21:08:22 INFO Job: Job job_1548417790455_0008 completed successfully
    19/01/25 21:08:22 INFO Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=44
            FILE: Number of bytes written=222537
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=156
            HDFS: Number of bytes written=26
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=13929
            Total time spent by all reduces in occupied slots (ms)=13266
            Total time spent by all map tasks (ms)=13929
            Total time spent by all reduce tasks (ms)=13266
            Total vcore-seconds taken by all map tasks=13929
            Total vcore-seconds taken by all reduce tasks=13266
            Total megabyte-seconds taken by all map tasks=14263296
            Total megabyte-seconds taken by all reduce tasks=13584384
        Map-Reduce Framework
            Map input records=3
            Map output records=7
            Map output bytes=72
            Map output materialized bytes=44
            Input split bytes=112
            Combine input records=7
            Combine output records=3
            Reduce input groups=3
            Reduce shuffle bytes=44
            Reduce input records=3
            Reduce output records=3
            Spilled Records=6
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=255
            CPU time spent (ms)=4470
            Physical memory (bytes) snapshot=403714048
            Virtual memory (bytes) snapshot=5511376896
            Total committed heap usage (bytes)=282066944
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=44
        File Output Format Counters 
            Bytes Written=26
    

    查看目标文件夹下输出的文件

    [hadoop@hadoop001 lib]$ hadoop fs -ls /out/WCoutput               
    Found 2 items
    -rw-r--r--   1 zh supergroup          0 2019-01-25 21:08 /out/WCoutput/_SUCCESS
    -rw-r--r--   1 zh supergroup         26 2019-01-25 21:08 /out/WCoutput/part-r-00000
    [hadoop@hadoop001 lib]$ hadoop fs -text /out/WCoutput/part-r-00000
    hello   4
    welcome 1
    world   2
    

    至于scala代码为什么运行不出来,请各位大神解答,多谢!

    相关文章

      网友评论

          本文标题:IDEA运行MapReduce的wordcount案例遇到的几个

          本文链接:https://www.haomeiwen.com/subject/ahaxjqtx.html