join

作者: yayooo | 来源:发表于2019-07-31 21:34 被阅读0次

    产生shuffle.

    作用:在类型为(K,V)和(K,W)的RDD上调用,返回一个相同key对应的所有元素对在一起的(K,(V,W))的RDD

    源码:

      def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))] = self.withScope {
        join(other, defaultPartitioner(self, other))
      }
    

    作用:在类型为(K,V)和(K,W)的RDD上调用,返回一个相同key对应的所有元素对在一起的(K,(V,W))的RDD

    package com.atguigu
    
    import org.apache.spark.rdd.RDD
    import org.apache.spark.{HashPartitioner, Partitioner, SparkConf, SparkContext}
    
    object Trans {
      def main(args: Array[String]): Unit = {
    
        val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("Spark01_Partition")
        //构建spark上下文对象
        val sc = new SparkContext(conf)
    
        val rdd: RDD[(Int, String)] = sc.makeRDD(Array((3,"aa"),(6,"bb"),(1,"cc"),(4,"dd")))
        val rdd1: RDD[(Int, String)] = sc.makeRDD(Array((3,"aa"),(6,"bb"),(1,"cc")))
        val rdd2: RDD[(Int, (String, String))] = rdd.join(rdd1)
        rdd2.foreach(println)
    
        sc.stop()
      }
    }
    
    

    (1,(cc,cc))
    (3,(aa,aa))
    (6,(bb,bb))

    结论:key匹配上才会join。

    相关文章

      网友评论

          本文标题:join

          本文链接:https://www.haomeiwen.com/subject/ilfzrctx.html