美文网首页
Spark开发--Local模式

Spark开发--Local模式

作者: 无剑_君 | 来源:发表于2019-12-02 16:40 被阅读0次

    一、Local模式

      Local 模式是最简单的一种Spark运行方式,它采用单节点多线程(cpu)方式运行,local模式是一种OOTB(开箱即用)的方式,只需要在spark-env.sh导出JAVA_HOME,无需其他任何配置即可使用,因而常用于开发和学习。
      本地单机模式下,所有的Spark进程均运行于同一个JVM中,并行处理则通过多线程来实现。在默认情况下,单机模式启动与本地系统的CPU核心数目相同的线程。如果要设置并行的级别,则以local[N]的格式来指定一个master变量,N表示要使用的线程数目。
    方式:

    ./spark-shell --master local[n] 
    

    n代表线程数

    二、前置条件

    1、Java8安装

    下载地址:http://openjdk.java.net/
    https://adoptopenjdk.net/releases.html

    下载
    root@master:~# apt install openjdk-8-jdk -y
    # 验证
    root@master:~# java -version
    openjdk version "1.8.0_222"
    OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
    OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)
    # JDK路径
    root@master:~# whereis java
    java: /usr/bin/java /usr/share/java /usr/share/man/man1/java.1.gz
    # 搜索java系统命令的位置
    root@master:~# which java
    /usr/bin/java
    
    # 下载安装
    root@master:~# wget https://github.com/AdoptOpenJDK/openjdk8-binaries/releases/download/jdk8u232-b09/OpenJDK8U-jdk_x64_linux_hotspot_8u232b09.tar.gz
    
    

    2、Scala scala 2.13.1

    1) 下载

    下载地址:https://www.scala-lang.org/download/

    查看版本
    信息查看
    下载
    # 下载
    root@slave2:~# root@master:~# wget https://downloads.lightbend.com/scala/2.12.10/scala-2.12.10.tgz
    
    # 解压
    root@master:~# tar -zxvf scala-2.12.10.tgz -C /usr/local
    
    

    注:如果要开发请使用2.12.10版本。

    2) 配置环境变量

    root@master:~# vi /etc/profile
    #末尾添加
    # 环境变量
    export SCALA_HOME=/usr/local/scala-2.12.10
    export PATH=$PATH:$SCALA_HOME/bin
    # 立即生效
    root@master:~# source /etc/profile
    
    # 验证
    root@master:~# scala -version
    Scala code runner version 2.12.10-- Copyright 2002-2019, LAMP/EPFL and Lightbend, Inc.
    
    

    三、下载安装Spark

    1. 下载安装
      下载地址:http://spark.apache.org/downloads.html
      下载地址
    # 下载
    root@master:~# wget https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
    
    # 解压
    root@master:~# tar -zxvf spark-2.4.4-bin-hadoop2.7.tgz -C /usr/local
    
    
    1. 配置环境变量
    # 配置环境变量
    export SCALA_HOME=/usr/local/scala-2.12.10
    export HADOOP_HOME=/usr/local/hadoop-2.9.2
    export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7
    export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
    
    # 环境变量立即生效
    root@master:~# source /etc/profile
    
    
    1. 配置Spark
    # 复制配置文件
    root@master:~# cp /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
    # 配置服务器IP
    root@master:~# vi /usr/local/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh
    
    #  添加内容
    SPARK_LOCAL_IP=192.168.247.131
    
    

    四、测试运行

    1. 指令示例
      1)单线程
    root@master:~# spark-shell --master local
    19/12/01 18:03:58 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.1.1; using 192.168.247.131 instead (on interface ens33)
    19/12/01 18:03:58 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    19/12/01 18:03:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://192.168.247.131:4040
    Spark context available as 'sc' (master = local, app id = local-1575194657764).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
          /_/
             
    Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> 
    
    # 进程查看
    root@master:~# jps
    128720 SparkSubmit
    129448 Jps
    
    

    Web查看:http://192.168.247.131:4040/


    Web查看

    2)多线程

    # 代表会有4个线程(每个线程一个core)来并发执行应用程序。
    root@master:~# spark-shell --master local[4] 
    19/12/01 18:06:31 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.1.1; using 192.168.247.131 instead (on interface ens33)
    19/12/01 18:06:31 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    19/12/01 18:06:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://192.168.247.131:4040
    Spark context available as 'sc' (master = local[4], app id = local-1575194812191).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
          /_/
             
    Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_222)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> 
    

    Web查看:http://192.168.247.131:4040/


    Web查看

      本地运行该模式非常简单,只需要把Spark的安装包解压后,改一些常用的配置即可使用,而不用启动Spark的Master、Worker守护进程( 只有集群的Standalone方式时,才需要这两个角色),也不用启动Hadoop的各服务(除非你要用到HDFS),这是和其他模式的区别。
    3)运行示例
      该算法是利用蒙特·卡罗算法求圆周率PI,通过计算机模拟大量的随机数,最终会计算出比较精确的π。
      10 为创建10个Executor 线程。

    # Spark 测试程序计算圆周率
    root@master:~# run-example SparkPi 10 --master local[2]
    19/12/01 18:15:39 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.1.1; using 192.168.247.131 instead (on interface ens33)
    19/12/01 18:15:39 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    19/12/01 18:15:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    19/12/01 18:15:41 INFO SparkContext: Running Spark version 2.4.4
    19/12/01 18:15:41 INFO SparkContext: Submitted application: Spark Pi
    19/12/01 18:15:42 INFO SecurityManager: Changing view acls to: root
    19/12/01 18:15:42 INFO SecurityManager: Changing modify acls to: root
    19/12/01 18:15:42 INFO SecurityManager: Changing view acls groups to: 
    19/12/01 18:15:42 INFO SecurityManager: Changing modify acls groups to: 
    19/12/01 18:15:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
    19/12/01 18:15:42 INFO Utils: Successfully started service 'sparkDriver' on port 35983.
    19/12/01 18:15:42 INFO SparkEnv: Registering MapOutputTracker
    19/12/01 18:15:43 INFO SparkEnv: Registering BlockManagerMaster
    19/12/01 18:15:43 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    19/12/01 18:15:43 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    19/12/01 18:15:43 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-be449821-e746-46b1-82a1-a85348a1d7c4
    19/12/01 18:15:43 INFO MemoryStore: MemoryStore started with capacity 413.9 MB
    19/12/01 18:15:43 INFO SparkEnv: Registering OutputCommitCoordinator
    19/12/01 18:15:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    19/12/01 18:15:44 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.247.131:4040
    19/12/01 18:15:44 INFO SparkContext: Added JAR file:///usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/scopt_2.11-3.7.0.jar at spark://192.168.247.131:35983/jars/scopt_2.11-3.7.0.jar with timestamp 1575195344159
    19/12/01 18:15:44 INFO SparkContext: Added JAR file:///usr/local/spark-2.4.4-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.4.jar at spark://192.168.247.131:35983/jars/spark-examples_2.11-2.4.4.jar with timestamp 1575195344167
    19/12/01 18:15:44 INFO Executor: Starting executor ID driver on host localhost
    19/12/01 18:15:44 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46305.
    19/12/01 18:15:44 INFO NettyBlockTransferService: Server created on 192.168.247.131:46305
    19/12/01 18:15:44 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    19/12/01 18:15:44 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.247.131, 46305, None)
    19/12/01 18:15:44 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.247.131:46305 with 413.9 MB RAM, BlockManagerId(driver, 192.168.247.131, 46305, None)
    19/12/01 18:15:44 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.247.131, 46305, None)
    19/12/01 18:15:44 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.247.131, 46305, None)
    19/12/01 18:15:46 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
    19/12/01 18:15:46 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
    19/12/01 18:15:46 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
    19/12/01 18:15:46 INFO DAGScheduler: Parents of final stage: List()
    19/12/01 18:15:46 INFO DAGScheduler: Missing parents: List()
    19/12/01 18:15:46 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
    19/12/01 18:15:46 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1936.0 B, free 413.9 MB)
    19/12/01 18:15:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1256.0 B, free 413.9 MB)
    19/12/01 18:15:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.247.131:46305 (size: 1256.0 B, free: 413.9 MB)
    19/12/01 18:15:47 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1161
    19/12/01 18:15:47 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
    19/12/01 18:15:47 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
    19/12/01 18:15:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:47 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
    19/12/01 18:15:47 INFO Executor: Fetching spark://192.168.247.131:35983/jars/spark-examples_2.11-2.4.4.jar with timestamp 1575195344167
    19/12/01 18:15:47 INFO TransportClientFactory: Successfully created connection to /192.168.247.131:35983 after 100 ms (0 ms spent in bootstraps)
    19/12/01 18:15:47 INFO Utils: Fetching spark://192.168.247.131:35983/jars/spark-examples_2.11-2.4.4.jar to /tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/fetchFileTemp4550729342539992157.tmp
    19/12/01 18:15:48 INFO Executor: Adding file:/tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/spark-examples_2.11-2.4.4.jar to class loader
    19/12/01 18:15:48 INFO Executor: Fetching spark://192.168.247.131:35983/jars/scopt_2.11-3.7.0.jar with timestamp 1575195344159
    19/12/01 18:15:48 INFO Utils: Fetching spark://192.168.247.131:35983/jars/scopt_2.11-3.7.0.jar to /tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/fetchFileTemp8485759279339378034.tmp
    19/12/01 18:15:48 INFO Executor: Adding file:/tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5/userFiles-55c5ee8c-34c0-4f1f-ac8a-3a2ab8fa186a/scopt_2.11-3.7.0.jar to class loader
    19/12/01 18:15:48 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 867 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
    19/12/01 18:15:48 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 867 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1056 ms on localhost (executor driver) (1/10)
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 125 ms on localhost (executor driver) (2/10)
    19/12/01 18:15:48 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 824 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 78 ms on localhost (executor driver) (3/10)
    19/12/01 18:15:48 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 867 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 77 ms on localhost (executor driver) (4/10)
    19/12/01 18:15:48 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4). 867 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 98 ms on localhost (executor driver) (5/10)
    19/12/01 18:15:48 INFO Executor: Running task 5.0 in stage 0.0 (TID 5)
    19/12/01 18:15:48 INFO Executor: Finished task 5.0 in stage 0.0 (TID 5). 824 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 150 ms on localhost (executor driver) (6/10)
    19/12/01 18:15:48 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
    19/12/01 18:15:48 INFO Executor: Finished task 6.0 in stage 0.0 (TID 6). 824 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 58 ms on localhost (executor driver) (7/10)
    19/12/01 18:15:48 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
    19/12/01 18:15:48 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 824 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 55 ms on localhost (executor driver) (8/10)
    19/12/01 18:15:48 INFO Executor: Running task 8.0 in stage 0.0 (TID 8)
    19/12/01 18:15:48 INFO Executor: Finished task 8.0 in stage 0.0 (TID 8). 824 bytes result sent to driver
    19/12/01 18:15:48 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 7866 bytes)
    19/12/01 18:15:48 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 42 ms on localhost (executor driver) (9/10)
    19/12/01 18:15:48 INFO Executor: Running task 9.0 in stage 0.0 (TID 9)
    19/12/01 18:15:49 INFO Executor: Finished task 9.0 in stage 0.0 (TID 9). 910 bytes result sent to driver
    19/12/01 18:15:49 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 64 ms on localhost (executor driver) (10/10)
    19/12/01 18:15:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    19/12/01 18:15:49 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 2.359 s
    19/12/01 18:15:49 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 2.735655 s
    Pi is roughly 3.1434631434631433
    19/12/01 18:15:49 INFO SparkUI: Stopped Spark web UI at http://192.168.247.131:4040
    19/12/01 18:15:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    19/12/01 18:15:49 INFO MemoryStore: MemoryStore cleared
    19/12/01 18:15:49 INFO BlockManager: BlockManager stopped
    19/12/01 18:15:49 INFO BlockManagerMaster: BlockManagerMaster stopped
    19/12/01 18:15:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    19/12/01 18:15:49 INFO SparkContext: Successfully stopped SparkContext
    19/12/01 18:15:49 INFO ShutdownHookManager: Shutdown hook called
    19/12/01 18:15:49 INFO ShutdownHookManager: Deleting directory /tmp/spark-deeed5e7-f7ef-4e86-9e50-fdee889328fe
    19/12/01 18:15:49 INFO ShutdownHookManager: Deleting directory /tmp/spark-98bd66ad-4e15-4c8c-bae0-6dfecb02a2b5
    
    

    相关文章

      网友评论

          本文标题:Spark开发--Local模式

          本文链接:https://www.haomeiwen.com/subject/sdttgctx.html