美文网首页
139、Spark核心编程进阶之mapPartitionsWit

139、Spark核心编程进阶之mapPartitionsWit

作者: ZFH__ZJ | 来源:发表于2019-01-21 22:02 被阅读0次

    mapPartitionsWithIndex,这个算子可以拿到每个partition的index

    PartitionsWithIndex {
        public static void main(String[] args) {
    
            SparkConf conf = new SparkConf().setAppName("MapPartitionsWithIndexJava").setMaster("local");
    
            JavaSparkContext sparkContext = new JavaSparkContext(conf);
            // 准备一下模拟数据
            List<String> studentNames = Arrays.asList("张三", "李四", "王二", "麻子");
    
            JavaRDD<String> studentNamesRDD = sparkContext.parallelize(studentNames, 2);
    
            JavaRDD<String> studentWithClassRDD = studentNamesRDD.mapPartitionsWithIndex(new Function2<Integer, Iterator<String>, Iterator<String>>() {
                @Override
                public Iterator<String> call(Integer integer, Iterator<String> stringIterator) throws Exception {
                    List<String> studentWithClassList = new ArrayList<String>();
    
                    while (stringIterator.hasNext()) {
                        String studentName = stringIterator.next();
                        String studentWithClass = studentName + "_" + (integer + 1);
                        studentWithClassList.add(studentWithClass);
                    }
    
                    return studentWithClassList.iterator();
                }
            }, true);
    
            studentWithClassRDD.foreach(new VoidFunction<String>() {
                @Override
                public void call(String s) throws Exception {
                    System.out.println("s = " + s);
                }
            });
        }
    }
    
    

    相关文章

      网友评论

          本文标题:139、Spark核心编程进阶之mapPartitionsWit

          本文链接:https://www.haomeiwen.com/subject/iyipdqtx.html