美文网首页
Spark-0.5.2源码解析-collection shuff

Spark-0.5.2源码解析-collection shuff

作者: 编程回忆录 | 来源:发表于2018-10-08 22:39 被阅读0次

    collection shuffle的意思就是打乱列表元素原有顺序返回一个新的列表,在Spark 0.5.2的源代码版本中,实现代码如下:

    /**
        * Shuffle the elements of a collection into a random order,returning the
        * result in a new collection.Unlike scala.util.Random.shuffle,this method
        * uses a local random number generator,avoiding inter-thread contention.
        *
        * @param seq
        * @tparam T
        * @return
        */
      def randomize[T: ClassManifest](seq: TraversableOnce[T]): Seq[T] = {
        randomizeInPlace(seq.toArray)
      }
    
      /**
        * Shuffle the elements of an array into a random order,modifying the
        * original array.Returns the original array.
        *
        */
      def randomizeInPlace[T](arr: Array[T], rand: Random = new Random): Array[T] = {
        for (i <- (arr.length - 1) to 1 by -1) {
          val j = rand.nextInt(i)
          val tmp = arr(j)
          arr(j) = arr(i)
          arr(i) = tmp
        }
        arr
      }
    

    这里值得关注的是randomizeInPlace方法参数传递了Random类型参数以避免多线程干扰问题。

    相关文章

      网友评论

          本文标题:Spark-0.5.2源码解析-collection shuff

          本文链接:https://www.haomeiwen.com/subject/ibvjaftx.html