Spark

作者: TenSleep_b32f | 来源:发表于2018-07-12 14:08 被阅读0次

Spark core


1. Starting Point:SparkContext

val conf = new SparkConf().setAppName(appName).setMaster(master)

new SparkContext(conf)

2.logic data:RDD

2.1.创建

2.1.1.Parallelized Collections

val data = Array(1, 2, 3, 4, 5)

val distData = sc.parallelize(data)

2.1.2.External Datasets

val distFile = sc.textFile("data.txt")


Spark SQL, DataFrames and Datasets


1. Starting Point:SparkSession

import org.apache.spark.sql.SparkSession

val spark = SparkSession

  .builder()

  .appName("Spark SQL basic example")

  .config("spark.some.config.option", "some-value")

  .getOrCreate()

// For implicit conversions like converting RDDs to DataFrames

import spark.implicits._

2.logic data:Structured RDD:Datasets and DataFrames


Spark Streaming


1. Starting Point:StreamingContext

import org.apache.spark._

import org.apache.spark.streaming._

val conf = new SparkConf().setAppName(appName).setMaster(master)

val ssc = new StreamingContext(conf, Seconds(1))

2.logic data:时间片的RDD(无状态的,有重复滑动窗口的,无重复滑动窗口的,有状态的)



Structured Streaming


1. Starting Point:SparkSessionval

import org.apache.spark.sql.functions._

import org.apache.spark.sql.SparkSession

val spark = SparkSession .builder .appName("StructuredNetworkWordCount") .getOrCreate()

import spark.implicits._

2.logic data:可存时间片的RDD的大表

相关文章

网友评论

      本文标题:Spark

      本文链接:https://www.haomeiwen.com/subject/mjhxpftx.html