美文网首页
使用docker安装spark2.4.3

使用docker安装spark2.4.3

作者: 阿亚2011 | 来源:发表于2019-06-06 11:36 被阅读0次

    前置说明

    在安装hbase之前, 安装了hadoop, 因为hbase的数据需要存放到hdfs中
    spark也与hadoop有关联, 但是要理解spark仅仅用到hadoop的库, 并不依赖hadoop程序, 它不需要安装hadoop, spark仅依赖jdk.
    spark有四大集群模式: standalone, mesos, yarn, k8s
    根据数据量, 确定使用最简单的standalone模式.

    下载

    https://www.apache.org/dyn/closer.lua/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

    docker基础镜像

    FROM ubuntu:16.04
    COPY sources.list /etc/apt/
    RUN apt update
    RUN apt install -y vim tzdata
    RUN rm /etc/localtime && ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo "Asia/Shanghai" > /etc/timezone
    ENV TZ="Asia/Shanghai"
    
    WORKDIR /
    COPY jdk1.8.0_171 /jdk1.8.0_171
    ENV JAVA_HOME=/jdk1.8.0_171
    ENV PATH=$PATH:/jdk1.8.0_171/bin
    RUN ln -s /jdk1.8.0_171/bin/java /usr/bin/java
    

    安装spark

    WORKDIR /spark
    COPY spark-2.4.3-bin-hadoop2.7 .
    ENV SPARK_HOME=/spark
    ENV PATH=$PATH:/spark/bin
    

    配置spark相关端口

    mkdir -p /home/mo/sjfx-spark-data
    cp spark-2.4.3-bin-hadoop2.7/conf -r /home/mo/sjfx-spark-data/config
    mv /home/mo/sjfx-spark-data/config/spark-env.sh.template /home/mo/sjfx-spark-data/config/spark-env.sh
    

    修改spark-env.sh, 增加

    export SPARK_MASTER_PORT=5030
    export SPARK_MASTER_WEBUI_PORT=5040
    export SPARK_WORKER_PORT=5031
    export SPARK_WORKER_WEBUI_PORT=5041
    

    启动master

    #/bin/sh
    docker stop sjfxspark-master
    docker rm sjfxspark-master
    docker run -d --name sjfxspark-master --net=host \
      -v /home/mo/sjfx-spark-data/config:/spark/conf  \
      -v /home/mo/sjfx-spark-data/logs:/spark/logs  \
      -v /home/mo/sjfx-spark-data/work:/spark/work  \
      sjfxspark:v1 sh -c "/spark/sbin/start-master.sh && tail -f /dev/null"
    

    可以查看web ui有没有显示了: http://192.168.1.26:5040

    启动slave

    #/bin/sh
    docker stop sjfxspark-slave
    docker rm sjfxspark-slave
    docker run -d --name sjfxspark-slave --net=host \
      -v /home/mo/sjfx-spark-data/config:/spark/conf  \
      -v /home/mo/sjfx-spark-data/logs:/spark/logs  \
      -v /home/mo/sjfx-spark-data/work:/spark/work  \
      sjfxspark:v1 sh -c "/spark/sbin/start-slave.sh spark://192.168.1.26:5030 && tail -f /dev/null"
    

    查看web ui : http://192.168.1.26:5041/
    再次查看master web ui , 发现已经有work信息了

    image.png

    测试

    ./spark-2.4.3-bin-hadoop2.7/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.1.26:5030 ./spark-2.4.3-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.4.3.jar 100
    

    可以在终端上看到输出:
    2019-06-06 11:34:56 INFO DAGScheduler:54 - Job 0 finished: reduce at SparkPi.scala:38, took 3.886408 s
    Pi is roughly 3.1414487141448713

    相关文章

      网友评论

          本文标题:使用docker安装spark2.4.3

          本文链接:https://www.haomeiwen.com/subject/lplyxctx.html