Spark core
1. Starting Point:SparkContext
val conf = new SparkConf().setAppName(appName).setMaster(master)
new SparkContext(conf)
2.logic data:RDD
2.1.创建
2.1.1.Parallelized Collections
val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)
2.1.2.External Datasets
val distFile = sc.textFile("data.txt")
Spark SQL, DataFrames and Datasets
1. Starting Point:SparkSession
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._
2.logic data:Structured RDD:Datasets and DataFrames
Spark Streaming
1. Starting Point:StreamingContext
import org.apache.spark._
import org.apache.spark.streaming._
val conf = new SparkConf().setAppName(appName).setMaster(master)
val ssc = new StreamingContext(conf, Seconds(1))
2.logic data:时间片的RDD(无状态的,有重复滑动窗口的,无重复滑动窗口的,有状态的)
Structured Streaming
1. Starting Point:SparkSessionval
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SparkSession
val spark = SparkSession .builder .appName("StructuredNetworkWordCount") .getOrCreate()
import spark.implicits._
网友评论