美文网首页
Spark-Standalone模式

Spark-Standalone模式

作者: ssttIsme | 来源:发表于2021-10-16 17:39 被阅读0次

    Standalone模式:只使用Spark自身节点运行的集群模式,也就是所谓的独立部署Standalone模式。

    下载

    http://archive.apache.org/dist/spark/spark-3.0.0/

    检查Java安装目录

    [server@hadoop102 ~]$ cd $JAVA_HOME
    [server@hadoop102 jdk1.8.0_65]$ pwd
    /opt/module/jdk1.8.0_65
    

    解压安装

    [server@hadoop102 ~]$ cd /opt/software/
    [server@hadoop102 software]$ tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module/
    
    [server@hadoop102 software]$ cd /opt/module/
    [server@hadoop102 module]$ mv spark-3.0.0-bin-hadoop3.2/ spark-standalone
    
    [server@hadoop102 module]$ cd /opt/module/spark-standalone/conf/
    [server@hadoop102 conf]$ mv slaves.template slaves
    [server@hadoop102 conf]$ vim slaves
    

    增加三台主机的host

    hadoop102
    hadoop103
    hadoop104
    
    server@hadoop102 conf]$ mv spark-env.sh.template spark-env.sh
    [server@hadoop102 conf]$ vim spark-env.sh
    

    增加java路径和master的host

    export JAVA_HOME=/opt/module/jdk1.8.0_65
    SPARK_MASTER_HOST=hadoop102
    SPARK_MASTER_PORT=7077
    

    编辑分发脚本

    [server@hadoop102 ~]$ pwd
    /home/server
    [server@hadoop102 ~]$ mkdir bin
    [server@hadoop102 ~]$ cd bin
    [server@hadoop102 bin]$ pwd
    /home/server/bin
    [server@hadoop102 bin]$ vim xsync
    #!/bin/bash
    
    #1. 判断参数个数
    if [ $# -lt 1 ]
    then
            echo Not Enough Argument!
            exit;
    fi
    
    #2. 遍历集群所有机器
    for host in hadoop102 hadoop103 hadoop104
    do
            echo =======  $host  ======
            #3. 遍历所有目录,挨个发送
     
            for file in $@
            do
                    #4. 判断文件是否存在
                    if [ -e $file ]
                            then
                                    #5. 获取父目录
                                    pdir=$(cd -P $(dirname $file);pwd)
    
                                    #6. 获取当前文件的名称
                                    fname=$(basename $file)
                                    ssh $host "mkdir -p $pdir"
                                    rsync -av $pdir/$fname $host:$pdir
                            else
                                    echo $file does not exits!
                    fi
            done
    done
    
    [server@hadoop102 bin]$ chmod 777 xsync
    [server@hadoop102 bin]$ cd ..
    [server@hadoop102 ~]$ pwd
    /home/server
    

    分发spark-standalone目录

    [server@hadoop102 conf]$ cd /opt/module/
    [server@hadoop102 module]$ xsync spark-standalone/
    

    启动

    [server@hadoop102 module]$ cd /opt/module/spark-standalone/
    [server@hadoop102 spark-standalone]$ sbin/start-all.sh 
    
    [server@hadoop102 spark-standalone]$ jps
    7570 Jps
    7387 Master
    7454 Worker
    
    [server@hadoop103 module]$ jps
    7276 Jps
    7183 Worker
    
    [server@hadoop104 ~]$ jps
    7202 Worker
    7305 Jps
    

    http://hadoop102:8080/


    提交应用
    bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master spark://hadoop102:7077 \
    ./examples/jars/spark-examples_2.12-3.0.0.jar \
    10
    

    --class 表示要执行程序的主类(Spark程序中包含主函数的类)
    --master spark://hadoop102:7077 表示独立部署模式(Spark程序运行的环境),连接到Spark集群
    spark-examples_2.12-3.0.0.jar运行类所在的jar包
    10表示程序的入口参数,用于设定当前应用的任务数量

    [server@hadoop102 spark-standalone]$ bin/spark-submit \
    > --class org.apache.spark.examples.SparkPi \
    > --master spark://hadoop102:7077 \
    > ./examples/jars/spark-examples_2.12-3.0.0.jar \
    > 10
    21/10/16 16:34:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    21/10/16 16:34:43 INFO SparkContext: Running Spark version 3.0.0
    21/10/16 16:34:43 INFO ResourceUtils: ==============================================================
    21/10/16 16:34:43 INFO ResourceUtils: Resources for spark.driver:
    
    21/10/16 16:34:43 INFO ResourceUtils: ==============================================================
    21/10/16 16:34:43 INFO SparkContext: Submitted application: Spark Pi
    21/10/16 16:34:44 INFO SecurityManager: Changing view acls to: server
    21/10/16 16:34:44 INFO SecurityManager: Changing modify acls to: server
    21/10/16 16:34:44 INFO SecurityManager: Changing view acls groups to: 
    21/10/16 16:34:44 INFO SecurityManager: Changing modify acls groups to: 
    21/10/16 16:34:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(server); groups with view permissions: Set(); users  with modify permissions: Set(server); groups with modify permissions: Set()
    21/10/16 16:34:45 INFO Utils: Successfully started service 'sparkDriver' on port 38763.
    21/10/16 16:34:45 INFO SparkEnv: Registering MapOutputTracker
    21/10/16 16:34:46 INFO SparkEnv: Registering BlockManagerMaster
    21/10/16 16:34:46 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    21/10/16 16:34:46 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    21/10/16 16:34:46 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
    21/10/16 16:34:46 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-15596ad6-364b-47bd-9056-e5909a388573
    21/10/16 16:34:46 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB
    21/10/16 16:34:46 INFO SparkEnv: Registering OutputCommitCoordinator
    21/10/16 16:34:47 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    21/10/16 16:34:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoop102:4040
    21/10/16 16:34:48 INFO SparkContext: Added JAR file:/opt/module/spark-standalone/./examples/jars/spark-examples_2.12-3.0.0.jar at spark://hadoop102:38763/jars/spark-examples_2.12-3.0.0.jar with timestamp 1634373288372
    21/10/16 16:34:49 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://hadoop102:7077...
    21/10/16 16:34:50 INFO TransportClientFactory: Successfully created connection to hadoop102/192.168.100.102:7077 after 237 ms (0 ms spent in bootstraps)
    21/10/16 16:34:51 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20211016163450-0000
    21/10/16 16:34:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34527.
    21/10/16 16:34:51 INFO NettyBlockTransferService: Server created on hadoop102:34527
    21/10/16 16:34:51 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    21/10/16 16:34:51 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20211016163450-0000/0 on worker-20211016162729-192.168.100.104-37825 (192.168.100.104:37825) with 1 core(s)
    21/10/16 16:34:51 INFO StandaloneSchedulerBackend: Granted executor ID app-20211016163450-0000/0 on hostPort 192.168.100.104:37825 with 1 core(s), 1024.0 MiB RAM
    21/10/16 16:34:51 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20211016163450-0000/1 on worker-20211016162727-192.168.100.102-45892 (192.168.100.102:45892) with 1 core(s)
    21/10/16 16:34:51 INFO StandaloneSchedulerBackend: Granted executor ID app-20211016163450-0000/1 on hostPort 192.168.100.102:45892 with 1 core(s), 1024.0 MiB RAM
    21/10/16 16:34:51 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20211016163450-0000/2 on worker-20211016162728-192.168.100.103-41221 (192.168.100.103:41221) with 1 core(s)
    21/10/16 16:34:51 INFO StandaloneSchedulerBackend: Granted executor ID app-20211016163450-0000/2 on hostPort 192.168.100.103:41221 with 1 core(s), 1024.0 MiB RAM
    21/10/16 16:34:51 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoop102, 34527, None)
    21/10/16 16:34:51 INFO BlockManagerMasterEndpoint: Registering block manager hadoop102:34527 with 413.9 MiB RAM, BlockManagerId(driver, hadoop102, 34527, None)
    21/10/16 16:34:51 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoop102, 34527, None)
    21/10/16 16:34:51 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, hadoop102, 34527, None)
    21/10/16 16:34:52 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20211016163450-0000/0 is now RUNNING
    21/10/16 16:34:52 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20211016163450-0000/2 is now RUNNING
    21/10/16 16:34:53 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20211016163450-0000/1 is now RUNNING
    21/10/16 16:34:54 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
    21/10/16 16:34:59 INFO SparkContext: Starting job: reduce at SparkPi.scala:38
    21/10/16 16:34:59 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
    21/10/16 16:34:59 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
    21/10/16 16:34:59 INFO DAGScheduler: Parents of final stage: List()
    21/10/16 16:34:59 INFO DAGScheduler: Missing parents: List()
    21/10/16 16:34:59 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
    21/10/16 16:35:00 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KiB, free 413.9 MiB)
    21/10/16 16:35:01 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1816.0 B, free 413.9 MiB)
    21/10/16 16:35:01 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop102:34527 (size: 1816.0 B, free: 413.9 MiB)
    21/10/16 16:35:01 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1200
    21/10/16 16:35:01 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
    21/10/16 16:35:01 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
    21/10/16 16:35:02 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
    21/10/16 16:35:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.100.103:54042) with ID 2
    21/10/16 16:35:06 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.100.104:43388) with ID 0
    21/10/16 16:35:06 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.100.103:33484 with 413.9 MiB RAM, BlockManagerId(2, 192.168.100.103, 33484, None)
    21/10/16 16:35:07 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.100.104:40508 with 413.9 MiB RAM, BlockManagerId(0, 192.168.100.104, 40508, None)
    21/10/16 16:35:07 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.100.103, executor 2, partition 0, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:07 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.100.104, executor 0, partition 1, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:09 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.103:33484 (size: 1816.0 B, free: 413.9 MiB)
    21/10/16 16:35:09 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.104:40508 (size: 1816.0 B, free: 413.9 MiB)
    21/10/16 16:35:13 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 192.168.100.103, executor 2, partition 2, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:13 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, 192.168.100.104, executor 0, partition 3, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:13 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 6244 ms on 192.168.100.104 (executor 0) (1/10)
    21/10/16 16:35:13 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6845 ms on 192.168.100.103 (executor 2) (2/10)
    21/10/16 16:35:13 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, 192.168.100.103, executor 2, partition 4, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:13 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 368 ms on 192.168.100.103 (executor 2) (3/10)
    21/10/16 16:35:14 INFO TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, 192.168.100.104, executor 0, partition 5, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:14 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 360 ms on 192.168.100.104 (executor 0) (4/10)
    21/10/16 16:35:14 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.100.102:48108) with ID 1
    21/10/16 16:35:14 INFO TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, 192.168.100.103, executor 2, partition 6, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:14 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 301 ms on 192.168.100.103 (executor 2) (5/10)
    21/10/16 16:35:14 INFO TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, 192.168.100.104, executor 0, partition 7, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:14 INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 272 ms on 192.168.100.104 (executor 0) (6/10)
    21/10/16 16:35:14 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, 192.168.100.103, executor 2, partition 8, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:14 INFO TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 273 ms on 192.168.100.103 (executor 2) (7/10)
    21/10/16 16:35:14 INFO TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, 192.168.100.104, executor 0, partition 9, PROCESS_LOCAL, 7397 bytes)
    21/10/16 16:35:14 INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 306 ms on 192.168.100.104 (executor 0) (8/10)
    21/10/16 16:35:14 INFO TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 285 ms on 192.168.100.103 (executor 2) (9/10)
    21/10/16 16:35:14 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 261 ms on 192.168.100.104 (executor 0) (10/10)
    21/10/16 16:35:14 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 14.920 s
    21/10/16 16:35:14 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
    21/10/16 16:35:14 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
    21/10/16 16:35:14 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
    21/10/16 16:35:14 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 15.503296 s
    Pi is roughly 3.1398471398471397
    21/10/16 16:35:15 INFO SparkUI: Stopped Spark web UI at http://hadoop102:4040
    21/10/16 16:35:15 INFO StandaloneSchedulerBackend: Shutting down all executors
    21/10/16 16:35:15 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
    21/10/16 16:35:15 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    21/10/16 16:35:15 INFO MemoryStore: MemoryStore cleared
    21/10/16 16:35:15 INFO BlockManager: BlockManager stopped
    21/10/16 16:35:15 INFO BlockManagerMaster: BlockManagerMaster stopped
    21/10/16 16:35:15 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    21/10/16 16:35:15 INFO SparkContext: Successfully stopped SparkContext
    21/10/16 16:35:16 INFO ShutdownHookManager: Shutdown hook called
    21/10/16 16:35:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-648f23eb-d06d-42a7-8680-57275eacd379
    21/10/16 16:35:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-8b0a7260-c41a-4ad5-9789-5cbb31744f8c
    

    停止服务

    [server@hadoop102 spark-standalone]$ sbin/stop-all.sh 
    hadoop104: stopping org.apache.spark.deploy.worker.Worker
    hadoop103: stopping org.apache.spark.deploy.worker.Worker
    hadoop102: stopping org.apache.spark.deploy.worker.Worker
    stopping org.apache.spark.deploy.master.Master
    

    提交参数说明

    bin/spark-submit \
    --class <main-class> \
    --master <master-url>\
    ... #other options
    <application-jar> \
    [application-arguments]
    
    参数 解释 可选值举例
    --class Spark程序中包含主函数的类
    --master Spark程序的运行模式(环境) 模式:local[*]、spark://hadoop102:7077、Yarn
    --executor-memory 1G 指定每个executor可用内存为1G
    --total-executor-cores 2 指定所有executor使用的cpu核数为2个
    --executor-cores 指定每个executor使用的cpu核数
    application-jar 打包好的应用jar,包含依赖。这个URL在集群中全局课件。比如hdfs://共享存储系统,如果是fille://path,那么所有的节点都包含同样的jar
    applications-arguments 传给main()方法的所有参数

    配置历史服务

    启动hadoop集群创建目录

    [server@hadoop102 spark-standalone]$ cd ~
    [server@hadoop102 ~]$ cd bin
    [server@hadoop102 bin]$ myhadoop.sh start
    =================启动 Hadoop集群========================
    ------------------启动 hdfs-----------------------------
    Starting namenodes on [hadoop102]
    Starting datanodes
    Starting secondary namenodes [hadoop104]
    ------------------启动 yarn-----------------------------
    Starting resourcemanager
    Starting nodemanagers
    ------------------启动 historyserver--------------------
    [server@hadoop102 bin]$ hadoop fs -mkdir /spark
    
    [server@hadoop102 bin]$ cd /opt/module/spark-standalone/conf/
    [server@hadoop102 conf]$ mv spark-defaults.conf.template spark-defaults.conf
    [server@hadoop102 conf]$ vim spark-defaults.conf 
    

    增加

    spark.eventLog.enabled           true
    spark.eventLog.dir               hdfs://hadoop102/spark
    
    [server@hadoop102 conf]$ vim spark-env.sh
    

    增加

    export SPARK_HISTORY_OPTS="
    -Dspark.history.ui.port=18080
    -Dspark.history.fs.logDirectory=hdfs://hadoop102/spark
    -Dspark.history.retainedApplications=30"
    

    Dspark.history.ui.port:WEBUI访问的端口号为18080
    -Dspark.history.fs.logDirectory:历史服务器日志存储路径
    -Dspark.history.retainedApplications:历史记录的个数,如果超过这个值,旧的应用程序信息将被删除,这个是内存中的应用数,而不是页面上显示的应用数。

    分发配置文件

    [server@hadoop102 conf]$ cd /opt/module/spark-standalone/
    [server@hadoop102 spark-standalone]$ xsync conf/
    

    启动集群和历史服务

    [server@hadoop102 spark-standalone]$ pwd
    /opt/module/spark-standalone
    [server@hadoop102 spark-standalone]$ sbin/start-all.sh 
    [server@hadoop102 spark-standalone]$ sbin/start-history-server.sh 
    starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/module/spark-standalone/logs/spark-server-org.apache.spark.deploy.history.HistoryServer-1-hadoop102.out
    

    执行任务

    bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master spark://hadoop102:7077 \
    ./examples/jars/spark-examples_2.12-3.0.0.jar \
    10
    
    [server@hadoop102 spark-standalone]$ bin/spark-submit \
    > --class org.apache.spark.examples.SparkPi \
    > --master spark://hadoop102:7077 \
    > ./examples/jars/spark-examples_2.12-3.0.0.jar \
    > 10
    

    http://hadoop102:18080/


    [server@hadoop102 spark-standalone]$ hadoop fs -ls /spark
    Found 1 items
    -rw-rw----   3 server supergroup     108036 2021-10-16 17:34 /spark/app-20211016173335-0000
    

    相关文章

      网友评论

          本文标题:Spark-Standalone模式

          本文链接:https://www.haomeiwen.com/subject/zceboltx.html