1. a list of partitions
2. a function for computing each split
3. a list dependencies on other RDDs
4. optionally, a partitioner for key-value RDDS
5. optionally, a list of preferred locations to compute each split on
* :: DeveloperApi ::
* Implemented by subclasses to compute a given partition.
def compute(split: Partition, context: TaskContext): Iterator[T]
* Implemented by subclasses to return the set of partitions in this RDD. This method will only
* be called once, so it is safe to implement a time-consuming computation in it.
* The partitions in this array must satisfy the following property:
* `rdd.partitions.zipWithIndex.forall { case (partition, index) => partition.index == index }`
protected def getPartitions: Array[Partition]
* Implemented by subclasses to return how this RDD depends on parent RDDs. This method will only
* be called once, so it is safe to implement a time-consuming computation in it.
protected def getDependencies: Seq[Dependency[_]] = deps
* Optionally overridden by subclasses to specify placement preferences.
protected def getPreferredLocations(split: Partition): Seq[String] = Nil
/** Optionally overridden by subclasses to specify how they are partitioned. */
@transient val partitioner: Option[Partitioner] = None