美文网首页
Spark On YARN 集群安装部署

Spark On YARN 集群安装部署

作者: lycium | 来源:发表于2017-02-28 10:32 被阅读0次

    续前文http://www.jianshu.com/p/b27545f6d730,基于搭建好的Hadoop集群来部署Spark

    1、安装 Scala

    官网下载Scala,我这里下载的是最新的2.12.1
    解压并设置环境变量

    export SCALA_HOME=/home/spark/scala-2.12.1
    export PATH=$SCALA_HOME/bin:$PATH
    
    
    [root@master jre]# source ~/.bashrc
    

    安装配置Spark

    下载预编译对应hadoop版本的Spark,基于已安装好的hadoop版本,我这里下载的是spark-2.1.0-bin-hadoop2.7

    配置Spark

    配置spark-env.sh

    cd /home/spark/spark-2.1.0-bin-hadoop2.7/conf    #进入spark配置目录
    cp spark-env.sh.template spark-env.sh   #从配置模板复制
    vi spark-env.sh     #添加配置内容
    
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64/jre
    export HADOOP_HOME=/usr/local/hadoop-2.7.3
    export HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
    SPARK_MASTER_HOST=master
    SPARK_LOCAL_DIRS=/home/spark/spark-2.1.0-bin-hadoop2.7
    SPARK_DRIVER_MEMORY=1G
    

    注:在设置Worker进程的CPU个数和内存大小,要注意机器的实际硬件条件,如果配置的超过当前Worker节点的硬件条件,Worker进程会启动失败。

    [root@master conf]# vim slaves
    
    master
    slave
    

    将配置好的Spark文件夹分发给所有slave,我这里只有一个slave

    scp -r /home/spark/spark-2.1.0-bin-hadoop2.7 root@slave:/home/spark
    

    启动Spark

    [root@master spark-2.1.0-bin-hadoop2.7]# sbin/start-all.sh
    

    检查Spark相关进程是否成功启动
    Master上:

    [root@master spark-2.1.0-bin-hadoop2.7]# jps
    13312 ResourceManager
    3716 Master
    13158 SecondaryNameNode
    12857 NameNode
    8697 Jps
    13451 NodeManager
    12989 DataNode
    3807 Worker
    

    Slave上:

    [root@localhost spark-2.1.0-bin-hadoop2.7]# jps
    9300 NodeManager
    15604 Jps
    1480 Worker
    9179 DataNode
    

    进入Spark的Web管理页面: http://192.168.1.240:8080

    Paste_Image.png

    运行示例

    示例代码如下:
    该示例为分别计算README.md文件中含有字母'a'和'b'的行数统计

    from pyspark import SparkContext
    
    logFile = "/user/test1/README.md"  # Should be some file on your hdfs system
    sc = SparkContext("local", "Simple App")
    logData = sc.textFile(logFile).cache()
    
    numAs = logData.filter(lambda s: 'a' in s).count()
    numBs = logData.filter(lambda s: 'b' in s).count()
    
    print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
    
    sc.stop()
    

    示例执行如下:

    [root@master spark-2.1.0-bin-hadoop2.7]# /home/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit --master spark:192.168.1.240:7077 --deploy-mode client /home/code/spark_test/test1.py
    Lines with a: 62, lines with b: 30
    
    

    相关文章

      网友评论

          本文标题:Spark On YARN 集群安装部署

          本文链接:https://www.haomeiwen.com/subject/kowbwttx.html