sample算子,可以使用指定的比例,比如说0.1或者0.9,从RDD中随机抽取10%或者90%的数据,从RDD中随机抽取数据的功能,推荐不要设置第三个参数,feed
public class Sample {
public static void main(String[] args) {
SparkConf conf = new SparkConf()
.setAppName("SampleJava")
.setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
List<String> staffList = Arrays.asList("张三", "李四", "王二", "麻子",
"赵六", "王五", "李大个", "王大妞", "小明", "小倩");
JavaRDD<String> staffRDD = sc.parallelize(staffList);
JavaRDD<String> sample = staffRDD.sample(false, 0.1);
sample.foreach(new VoidFunction<String>() {
@Override
public void call(String s) throws Exception {
System.out.println("s = " + s);
}
});
}
}
网友评论