Spark

作者: TenSleep_b32f | 来源:发表于2018-07-12 14:08 被阅读0次

    Spark core


    1. Starting Point:SparkContext

    val conf = new SparkConf().setAppName(appName).setMaster(master)

    new SparkContext(conf)

    2.logic data:RDD

    2.1.创建

    2.1.1.Parallelized Collections

    val data = Array(1, 2, 3, 4, 5)

    val distData = sc.parallelize(data)

    2.1.2.External Datasets

    val distFile = sc.textFile("data.txt")


    Spark SQL, DataFrames and Datasets


    1. Starting Point:SparkSession

    import org.apache.spark.sql.SparkSession

    val spark = SparkSession

      .builder()

      .appName("Spark SQL basic example")

      .config("spark.some.config.option", "some-value")

      .getOrCreate()

    // For implicit conversions like converting RDDs to DataFrames

    import spark.implicits._

    2.logic data:Structured RDD:Datasets and DataFrames


    Spark Streaming


    1. Starting Point:StreamingContext

    import org.apache.spark._

    import org.apache.spark.streaming._

    val conf = new SparkConf().setAppName(appName).setMaster(master)

    val ssc = new StreamingContext(conf, Seconds(1))

    2.logic data:时间片的RDD(无状态的,有重复滑动窗口的,无重复滑动窗口的,有状态的)



    Structured Streaming


    1. Starting Point:SparkSessionval

    import org.apache.spark.sql.functions._

    import org.apache.spark.sql.SparkSession

    val spark = SparkSession .builder .appName("StructuredNetworkWordCount") .getOrCreate()

    import spark.implicits._

    2.logic data:可存时间片的RDD的大表

    相关文章

      网友评论

          本文标题:Spark

          本文链接:https://www.haomeiwen.com/subject/mjhxpftx.html