collection shuffle的意思就是打乱列表元素原有顺序返回一个新的列表,在Spark 0.5.2的源代码版本中,实现代码如下:
/**
* Shuffle the elements of a collection into a random order,returning the
* result in a new collection.Unlike scala.util.Random.shuffle,this method
* uses a local random number generator,avoiding inter-thread contention.
*
* @param seq
* @tparam T
* @return
*/
def randomize[T: ClassManifest](seq: TraversableOnce[T]): Seq[T] = {
randomizeInPlace(seq.toArray)
}
/**
* Shuffle the elements of an array into a random order,modifying the
* original array.Returns the original array.
*
*/
def randomizeInPlace[T](arr: Array[T], rand: Random = new Random): Array[T] = {
for (i <- (arr.length - 1) to 1 by -1) {
val j = rand.nextInt(i)
val tmp = arr(j)
arr(j) = arr(i)
arr(i) = tmp
}
arr
}
这里值得关注的是randomizeInPlace方法参数传递了Random类型参数以避免多线程干扰问题。
网友评论