美文网首页
spark集群安装

spark集群安装

作者: 大胖圆儿小姐 | 来源:发表于2021-12-02 09:34 被阅读0次

一、安装概述

本文将继续配置我的虚拟机,此文章需要基于hadoop平台安装成功才可spark集群,如需参考请点击链接:https://www.jianshu.com/p/1bbfbb3968b6,我的虚拟机的情况也在此篇文章说明了,jdk及hadoop是安装spark集群的依赖环境,此处不再赘述了。spark所选版本3.0.3,而spark 3.0+是基于scala 2.12版本编译的,所以还需要安装scala2.12。

二、软件选择

  1. spark版本的选择,附spark官网的地址,http://spark.apache.org/downloads.html
    image.png
  2. scala版本的选择,附scala官网的地址,https://www.scala-lang.org/download/2.12.2.html
    image.png

三、scala环境的安装,三台机器都要执行以下操作

  1. 首先上传scala压缩包上传到hadoop用户的根目录
[hadoop@hadoop01 ~]# ll
total 633388
drwxrwxrwx. 11 hadoop hadoop       173 Nov 13 09:08 hadoop-2.10.1
-rw-r--r--.  1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
-rw-r--r--.  1 hadoop hadoop  19596088 Nov 30 17:14 scala-2.12.2.tgz
[hadoop@hadoop01 ~]# 
  1. 非必要步骤,如后续上传文件出现此权限问题,也可使用此方法。当用户权限不是hadoop时,可使用chown命令修改用户权限,此命令需要在root用户下执行
[hadoop@hadoop01 ~]$ exit
exit
You have new mail in /var/spool/mail/root
[root@hadoop01 ~]# cd /home/hadoop/
[root@hadoop01 hadoop]# ll
total 633388
drwxrwxrwx. 11 hadoop hadoop       173 Nov 13 09:08 hadoop-2.10.1
-rw-r--r--.  1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
-rw-r--r--.  1 hadoop hadoop  19596088 Nov 30 17:14 scala-2.12.2.tgz
[root@hadoop01 hadoop]$ chown -R hadoop:hadoop scala-2.12.2.tgz 
[root@hadoop01 hadoop]# su hadoop
  1. 解压scala文件
[hadoop@hadoop01 ~]$ tar -zxvf scala-2.12.2.tgz
[hadoop@hadoop01 ~]$ cd scala-2.12.2
[hadoop@hadoop01 scala-2.12.2]$ pwd
/home/hadoop/scala-2.12.2 
  1. 修改scala的环境变量
[hadoop@hadoop01 scala-2.12.2]$ vim ~/.bashrc

export HADOOP_HOME=/home/hadoop/hadoop-2.10.1
export SCALA_HOME=/home/hadoop/scala-2.12.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin
  1. 使环境变量生效
[hadoop@hadoop01 scala-2.12.2]# source ~/.bashrc
  1. 验证scala是否安装成功
[root@hadoop01 scala-2.12.2]# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

三、spark环境的安装

  1. 首先上传spark压缩包上传到hadoop01虚拟机hadoop用户的根目录
[hadoop@hadoop01 ~]$ ll
total 633388
drwxrwxrwx. 11 hadoop hadoop       173 Nov 13 09:08 hadoop-2.10.1
-rw-r--r--.  1 hadoop hadoop 408587111 Nov 12 11:07 hadoop-2.10.1.tar.gz
drwxrwxr-x.  6 hadoop hadoop        50 Apr 13  2017 scala-2.12.2
-rw-r--r--.  1 hadoop hadoop  19596088 Nov 30 17:14 scala-2.12.2.tgz
-rw-r--r--.  1 hadoop hadoop 220400553 Nov 30 17:14 spark-3.0.3-bin-hadoop2.7.tgz
You have new mail in /var/spool/mail/root
  1. 解压spark文件
[hadoop@hadoop01 ~]$ tar -zxvf spark-3.0.3-bin-hadoop2.7.tgz
[hadoop@hadoop01 ~]$ cd spark-3.0.3-bin-hadoop2.7
[hadoop@hadoop01 spark-3.0.3-bin-hadoop2.7]$ pwd
/home/hadoop/spark-3.0.3-bin-hadoop2.7 
  1. 配置slaves文件
[hadoop@hadoop01 spark-3.0.3-bin-hadoop2.7]$ cd conf
[hadoop@hadoop01 conf]$ mv slaves.template slaves
[hadoop@hadoop01 conf]$ vim slaves 

#将localhost修改成一下三个节点的名称
hadoop01
hadoop02
hadoop03
  1. 配置spark-env.sh文件
[hadoop@hadoop01 conf]$ mv spark-env.sh.template spark-env.sh
[hadoop@hadoop01 conf]$ vim spark-env.sh

export MASTER=spark://172.16.100.26:7077
export SPARK_MASTER_IP=172.16.100.26
  1. 配置spark的环境变量
[hadoop@hadoop01 conf]$ vim ~/.bashrc

export JAVA_HOME=/usr/local/java
export HADOOP_HOME=/home/hadoop/hadoop-2.10.1
export SCALA_HOME=/home/hadoop/scala-2.12.2
export SPARK_HOME=/home/hadoop/spark-3.0.3-bin-hadoop2.7
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin
  1. 使环境变量生效
[hadoop@hadoop01 conf]$ source ~/.bashrc 
  1. 将hadoop01节点的spark软件拷贝到hadoop02、hadoop03节点的根目录
[hadoop@hadoop01 ~]$ scp -r spark-3.0.3-bin-hadoop2.7 hadoop@hadoop02:~
[hadoop@hadoop01 ~]$ scp -r spark-3.0.3-bin-hadoop2.7 hadoop@hadoop03:~
  1. 修改hadoop02、hadoop03节点的环境变量,操作命令按照5、6的方法
  2. 启动spark集群
[hadoop@hadoop01 sbin]$ ./stop-all.sh 
hadoop02: stopping org.apache.spark.deploy.worker.Worker
hadoop03: stopping org.apache.spark.deploy.worker.Worker
hadoop01: no org.apache.spark.deploy.worker.Worker to stop
stopping org.apache.spark.deploy.master.Master

[hadoop@hadoop01 sbin]$ ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop01.out
hadoop01: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
hadoop03: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop03.out
hadoop02: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop02.out
  1. 在网页上通过ui界面验证启动结果,http://172.16.100.26:8080


    image.png

四、我遇到的问题

  1. 在安装jdk时,将环境变量配置在/etc/profile中,想要把jdk当成全局变量去使用,所以我在3.5步骤去配置当前用户的环境变量时,没有设置JAVA_HOME,于是启动报了如下错误:
[hadoop@hadoop01 sbin]$ ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoop01.out
hadoop03: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop03.out
hadoop02: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop02.out
hadoop01: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out
hadoop01: failed to launch: nice -n 0 /home/hadoop/spark-3.0.3-bin-hadoop2.7/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://hadoop01:7077
hadoop01:   JAVA_HOME is not set
hadoop01: full log in /home/hadoop/spark-3.0.3-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop01.out

很明显,是由于读取不到java的环境变量,于是我在当前目录的.bashrc文件中配置了,成功启动,想不通为什么会这样,全局变量中设置的环境变量怎么就不生效呐?姑且把它当成spark的遗留小问题吧,等我想起来再来探寻答案!!!

相关文章

  • 0901 Spark Standalone Mode

    转载请注明出处,谢谢合作~ Spark 独立集群部署模式 安全机制(Security) 安装 Spark 独立集群...

  • spark集群安装

    spark集群安装 安装环境:linux centos6.x 注意:此处直接使用spark是可以不需要安装hado...

  • Spark的安装及配置

    1 安装说明 在安装spark之前,需要安装hadoop集群环境,如果没有可以查看:Hadoop分布式集群的搭建 ...

  • spark之旅-1.spark的安装与启动

    1.spark集群的安装 spark-2.4.1-bin-hadoop2.7.tgz tar -zxvf spar...

  • spark集群安装

    生成rsa后,用ssh-copy-id targetIP 配置免密码登录的目的是 spark通过ssh协议将远端...

  • spark集群安装

    启动集群 检查结果

  • spark集群安装

    一、安装概述 本文将继续配置我的虚拟机,此文章需要基于hadoop平台安装成功才可spark集群,如需参考请点击链...

  • spark集群搭建

    在上篇基础上安装spark集群 安装scala scala-2.13.2.tgz 链接: https://pan....

  • Spark 基础知识

    一、Spark 集群安装 修改 spark-env.sh 文件,在该配置文件中添加如下配置 修改 slaves 文...

  • 基于Docker搭建Spark集群

    介绍 在Mac OS下使用Docker搭建Spark集群,进行学习。 准备工作 安装Docker 安装Docker...

网友评论

      本文标题:spark集群安装

      本文链接:https://www.haomeiwen.com/subject/qwvwtrtx.html