美文网首页
spark集群安装

spark集群安装

作者: 大胖圆儿小姐 | 来源:发表于2021-12-02 09:34 被阅读0次

    一、安装概述

    本文将继续配置我的虚拟机,此文章需要基于hadoop平台安装成功才可spark集群,如需参考请点击链接:https://www.jianshu.com/p/1bbfbb3968b6,我的虚拟机的情况也在此篇文章说明了,jdk及hadoop是安装spark集群的依赖环境,此处不再赘述了。spark所选版本3.0.3,而spark 3.0+是基于scala 2.12版本编译的,所以还需要安装scala2.12。

    二、软件选择

    1. spark版本的选择,附spark官网的地址,http://spark.apache.org/downloads.html
      image.png
    2. scala版本的选择,附scala官网的地址,https://www.scala-lang.org/download/2.12.2.html
      image.png

    三、scala环境的安装,三台机器都要执行以下操作

    1. 首先上传scala压缩包上传到hadoop用户的根目录
    [hadoop@hadoop01 ~]# ll
    total 633388
    drwxrwxrwx. 11 hadoop hadoop       173 Nov 13 09:08 hadoop-2.10.1
    -rw-r--r--.  1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
    -rw-r--r--.  1 hadoop hadoop  19596088 Nov 30 17:14 scala-2.12.2.tgz
    [hadoop@hadoop01 ~]# 
    
    1. 非必要步骤,如后续上传文件出现此权限问题,也可使用此方法。当用户权限不是hadoop时,可使用chown命令修改用户权限,此命令需要在root用户下执行
    [hadoop@hadoop01 ~]$ exit
    exit
    You have new mail in /var/spool/mail/root
    [root@hadoop01 ~]# cd /home/hadoop/
    [root@hadoop01 hadoop]# ll
    total 633388
    drwxrwxrwx. 11 hadoop hadoop       173 Nov 13 09:08 hadoop-2.10.1
    -rw-r--r--.  1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
    -rw-r--r--.  1 hadoop hadoop  19596088 Nov 30 17:14 scala-2.12.2.tgz
    [root@hadoop01 hadoop]$ chown -R hadoop:hadoop scala-2.12.2.tgz 
    [root@hadoop01 hadoop]# su hadoop
    
    1. 解压scala文件
    [hadoop@hadoop01 ~]$ tar -zxvf scala-2.12.2.tgz
    [hadoop@hadoop01 ~]$ cd scala-2.12.2
    [hadoop@hadoop01 scala-2.12.2]$ pwd
    /home/hadoop/scala-2.12.2 
    
    1. 修改scala的环境变量
    [hadoop@hadoop01 scala-2.12.2]$ vim ~/.bashrc
    
    export HADOOP_HOME=/home/hadoop/hadoop-2.10.1
    export SCALA_HOME=/home/hadoop/scala-2.12.2
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin
    
    1. 使环境变量生效
    [hadoop@hadoop01 scala-2.12.2]# source ~/.bashrc
    
    1. 验证scala是否安装成功
    [root@hadoop01 scala-2.12.2]# scala -version
    Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.
    

    三、spark环境的安装

    1. 首先上传spark压缩包上传到hadoop01虚拟机hadoop用户的根目录
    [hadoop@hadoop01 ~]$ ll
    total 633388
    drwxrwxrwx. 11 hadoop hadoop       173 Nov 13 09:08 hadoop-2.10.1
    -rw-r--r--.  1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
    drwxrwxr-x.  6 hadoop hadoop        50 Apr 13  2017 scala-2.12.2
    -rw-r--r--.  1 hadoop hadoop  19596088 Nov 30 17:14 scala-2.12.2.tgz
    -rw-r--r--.  1 hadoop hadoop 220400553 Nov 30 17:14 spark-3.0.3-bin-hadoop2.7.tgz
    You have new mail in /var/spool/mail/root
    
    1. 解压spark文件
    [hadoop@hadoop01 ~]$ tar -zxvf spark-3.0.3-bin-hadoop2.7.tgz
    [hadoop@hadoop01 ~]$ cd spark-3.0.3-bin-hadoop2.7
    [hadoop@hadoop01 spark-3.0.3-bin-hadoop2.7]$ pwd
    /home/hadoop/spark-3.0.3-bin-hadoop2.7 
    
    1. 配置slaves文件
    [hadoop@hadoop01 spark-3.0.3-bin-hadoop2.7]$ cd conf
    [hadoop@hadoop01 conf]$ mv slaves.template slaves
    [hadoop@hadoop01 conf]$ vim slaves 
    
    #将localhost修改成一下三个节点的名称
    hadoop01
    hadoop02
    hadoop03
    
    1. 配置spark-env.sh文件
    [hadoop@hadoop01 conf]$ mv spark-env.sh.template spark-env.sh
    [hadoop@hadoop01 conf]$ vim spark-env.sh
    
    export MASTER=spark://172.16.100.26:7077
    export SPARK_MASTER_IP=172.16.100.26
    
    1. 配置spark的环境变量
    [hadoop@hadoop01 conf]$ vim ~/.bashrc
    
    export JAVA_HOME=/usr/local/java
    export HADOOP_HOME=/home/hadoop/hadoop-2.10.1
    export SCALA_HOME=/home/hadoop/scala-2.12.2
    export SPARK_HOME=/home/hadoop/spark-3.0.3-bin-hadoop2.7
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin
    
    1. 使环境变量生效
    [hadoop@hadoop01 conf]$ source ~/.bashrc 
    
    1. 将hadoop01节点的spark软件拷贝到hadoop02、hadoop03节点的根目录
    [hadoop@hadoop01 ~]$ scp -r spark-3.0.3-bin-hadoop2.7 hadoop@hadoop02:~
    [hadoop@hadoop01 ~]$ scp -r spark-3.0.3-bin-hadoop2.7 hadoop@hadoop03:~
    
    1. 修改hadoop02、hadoop03节点的环境变量,操作命令按照5、6的方法
    2. 启动spark集群
    [hadoop@hadoop01 sbin]$ ./stop-all.sh 
    hadoop02: stopping org.apache.spark.deploy.worker.Worker
    hadoop03: stopping org.apache.spark.deploy.worker.Worker
    hadoop01: no org.apache.spark.deploy.worker.Worker to stop
    stopping org.apache.spark.deploy.master.Master
    
    [hadoop@hadoop01 sbin]$ ./start-all.sh 
    starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop01.out
    hadoop01: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
    hadoop03: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop03.out
    hadoop02: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop02.out
    
    1. 在网页上通过ui界面验证启动结果,http://172.16.100.26:8080


      image.png

    四、我遇到的问题

    1. 在安装jdk时,将环境变量配置在/etc/profile中,想要把jdk当成全局变量去使用,所以我在3.5步骤去配置当前用户的环境变量时,没有设置JAVA_HOME,于是启动报了如下错误:
    [hadoop@hadoop01 sbin]$ ./start-all.sh 
    starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop01.out
    hadoop03: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop03.out
    hadoop02: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop02.out
    hadoop01: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
    hadoop01: failed to launch: nice -n 0 /home/hadoop/spark-3.0.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop01:7077
    hadoop01:   JAVA_HOME is not set
    hadoop01: full log in /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
    

    很明显,是由于读取不到java的环境变量,于是我在当前目录的.bashrc文件中配置了,成功启动,想不通为什么会这样,全局变量中设置的环境变量怎么就不生效呐?姑且把它当成spark的遗留小问题吧,等我想起来再来探寻答案!!!

    相关文章

      网友评论

          本文标题:spark集群安装

          本文链接:https://www.haomeiwen.com/subject/qwvwtrtx.html