SparkConf conf = new SparkConf().setAppName("word count");
JavaSparkContext sc = new JavaSparkContext(conf);
// 官方写法
JavaRDD<String> textFile = sc.textFile("hdfs://...");
JavaPairRDD<String, Integer> counts = textFile
.flatMap(s -> Arrays.asList(s.split(" ")).iterator())
.mapToPair(word -> new Tuple2<>(word, 1))
.reduceByKey((a, b) -> a + b);
counts.saveAsTextFile("hdfs://...");
在这个过程中可能会遇到错误:java.lang.ArrayIndexOutOfBoundsException: 10582
解决办法:在pom文件中添加
<dependency>
<groupId>com.thoughtworks.paranamer</groupId>
<artifactId>paranamer</artifactId>
<version>2.8</version>
</dependency>
网友评论