美文网首页
Spark 2.0, high level concept

Spark 2.0, high level concept

作者: abrocod | 来源:发表于2016-09-26 09:15 被阅读72次

Entry point and basic abstraction

For Spark base
main entry point: SparkContext
basic abstraction: RDD

For Spark SQL
main entry point: SparkSession
basic abstraction: DataFrame

For Spark Streaming
Main entry point:
basic abstraction: DStream

For Spark ML
Main entry point:

Core Classes

  • Spark base

  • pyspark.SparkContext
    Main entry point for Spark functionality.

  • pyspark.RDD
    A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.

  • Spark Streaming

  • pyspark.streaming.StreamingContext
    Main entry point for Spark Streaming functionality.

  • pyspark.streaming.DStream
    A Discretized Stream (DStream), the basic abstraction in Spark Streaming.

  • Spark SQL and DataFrame

  • pyspark.sql.SQLContext
    Main entry point for DataFrame and SQL functionality.

  • pyspark.sql.DataFrame
    A distributed collection of data grouped into named columns.


Spark running mode

Locally

Cluster


Setup and run/submit job

Locally

Setup

Spark shell and submit job

./bin/spark-shell --master local[2]
OR
./bin/pyspark --master local[2]

Submit job

./bin/spark-submit examples/src/main/python/pi.py 10

OR 
./bin/spark-submit examples/src/main/r/dataframe.R

Spark Stand alone cluster


Spark YARN cluster


What ?

:paste
:help

Spark context available as sc.
SQL context available as sqlContext.

Read csv files as Dataframe in Apache Spark with spark-csv package. after loading data to Dataframe save dataframe to parquetfile.

val df = sqlContext.read
      .format("com.databricks.spark.csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .option("mode", "DROPMALFORMED")
      .load("/home/myuser/data/log/*.csv")
df.saveAsParquetFile("/home/myuser/data.parquet")

val df_1 = sqlContext.read.parquet("/Users/user_name/Work/tmp/sample.parquet")
df.dtypes
df.show()

相关文章

网友评论

      本文标题:Spark 2.0, high level concept

      本文链接:https://www.haomeiwen.com/subject/updzettx.html