Spark环境部署

作者: Jogging | 来源:发表于2016-06-02 10:21 被阅读476次

    148.169服务器(Centos 6.2)上面下载安装文件和依赖文件,部署相关环境。
    参考:
    http://spark.apache.org/docs/latest/index.html
    https://www.python.org/downloads/release/python-2710/

    1.建立安装文件目录

    mkdir /data/soft
    cd /data/soft
    
    2.下载spark源码文件

    wget -c http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2.tgz
    
    3.spark依赖环境

    Spark runs on Java 7+, Python 2.6+ ,Numpy and R 3.1+. For the Scala API, Spark 1.5.2 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).

    4.安装Python 2.7+

    wget -c https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz
    tar -xzf Python-2.7.10.tgz
    
    yum groupinstall "Development tools"
    yum install zlib-devel
    yum install bzip2-devel
    yum install openssl-devel
    yum install ncurses-devel
    
    cd Python-2.7.10
    ./configure --prefix=/usr/local
    make && make altinstall
    

    建立软连接,使系统默认的python指向python2.7
    正常情况下即使python2.7安装成功后,系统默认指向的python仍然是2.6.6版本

    mv /usr/bin/python /usr/bin/python.bak
    ln -s /usr/local/bin/python2.7 /usr/bin/python
    

    解决系统python软链接指向python2.7版本后,yum出错!
    方法:

    vi /usr/bin/yum
    

    将文本编辑显示的#/usr/bin/python修改为#/usr/bin/python2.6,保存修改即可!

    5.安装Numpy & Scipy

    先安装python包管理工具pip

    wget -c https://bootstrap.pypa.io/get-pip.py --no-check-certificate
    python get-pip.py
    
    pip install numpy
    

    安装scipy需要下面的依赖

    yum install lapack lapack-devel blas blas-devel  
    pip install scipy
    
    6.安装Java7+

    yum search openjdk-devel
    yum install java-1.7.0-openjdk-devel.x86_64
    /usr/sbin/alternatives --config java
    /usr/sbin/alternatives --config javac
    
    vim /etc/profile
    export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64
    export JRE_HOME=$JAVA_HOME/jre
    export PATH=$PATH:$JAVA_HOME/bin
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    
    7.安装Scala 2.11.7

    wget -c http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.tgz
    tar -xzf scala-2.11.7.tgz
    cp -R /data/soft/scala-2.11.7 /usr/local/
    
    vim /etc/profile
    export SCALA_HOME=/usr/local/scala-2.11.7
    export PATH=$PATH:$SCALA_HOME/bin
    
    8.安装R3.1+

    wget -c http://mirror.bjtu.edu.cn/cran/src/base/R-3/R-3.2.2.tar.gz
    tar -xzf R-3.2.2.tar.gz
    cd R-3.2.2
    
    yum install readline-devel
    yum install libXt-devel
    ./configure --prefix=/usr/local
    make && make install
    
    9.安装Spark

    wget -c http://d3kbcqa49mib13.cloudfront.net/spark-1.5.2-bin-hadoop2.3.tgz
    tar -xzf /data/soft/spark-1.5.2-bin-hadoop2.3.tgz -C /data/hadoop/
    
    10.配置Spark

    cp /data/hadoop/spark-1.5.2-bin-hadoop2.3/conf/spark-env.sh.template /data/hadoop/spark-1.5.2-bin-hadoop2.3/conf/spark-env.sh
    vim spark-env.sh
    
    export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
    export HADOOP_HOME=/data/hadoop/hadoop-2.3.0-cdh5.1.0
    export HADOOP_CONF_DIR=/data/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoop
    export YARN_CONF_DIR=/data/hadoop/hadoop-2.3.0-cdh5.1.0/etc/hadoop
    export HIVE_HOME=/data/hadoop/hive-0.12.0-cdh5.1.0
    export SCALA_HOME=/usr/local/scala-2.11.7
    export SPARK_HOME=/data/hadoop/spark-1.5.2-bin-2.3.0
    export SPARK_LOCAL_DIRS=/data/hadoop/app/tmp
    export SPARK_PID_DIR=/data/hadoop/app/pids
    
    export SPARK_MASTER_IP=host169
    export SPARK_MASTER_PORT=7077
    
    export PYSPARK_PYTHON=/usr/local/bin/python2.7
    export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7
    
    export SPARK_LOCAL_IP=host177
    export SPARK_YARN_QUEUE=hadoop
    export SPARK_WORKER_CORES=10
    export SPARK_WORKER_INSTANCES=1
    export SPARK_WORKER_MEMORY=30G
    export SPARK_WORKER_WEBUI_PORT=8081
    export SPARK_EXECUTOR_CORES=1
    export SPARK_EXECUTOR_MEMORY=5G
    
    export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HIVE_HOME/lib/mysql-connector-java-5.1.20-bin.jar
    
    11.配置各个服务器的Hosts

    vim /etc/hosts
    192.168.6.86    host86
    192.168.6.87    host87
    192.168.6.88    host88
    192.168.6.89    host89
    192.168.6.164   host164
    192.168.6.165   host165
    192.168.6.166   host166
    192.168.6.167   host167
    192.168.6.168   host168
    192.168.6.169   host169
    
    12.停掉iptables服务

    service iptables stop
    chkconfig iptables off
    
    13.集群环境搭建

    从各个服务器上复制192.168.6.169上的文件

    rsync -avz --include "soft/" --exclude "/*" 192.168.6.169::data /data
    

    在其他服务器节点上重复上面4到11的操作

    cp /data/soft/spark-env.sh /data/hadoop/spark-1.5.2-bin-hadoop2.3/conf/
    
    14.启动Spark集群

    以host169服务器做为主节点
    在host169上启动master

    $SPARK_HOME/sbin/start-master.sh
    

    启动之后可以通过http://host169:8080/访问spark集群WebUI,页面中的spark://HOST:PORT为worker注册到master中的标识
    同时host169也作为worker使用

    $SPARK_HOME/sbin/start-slave.sh spark://host169:7077
    

    在其他服务器节点上启动worker,注册到master中

    $SPARK_HOME/sbin/start-slave.sh spark://host169:7077
    
    15.启动Spark交互Shell

    scala shell

    ./bin/spark-shell --master spark://IP:PORT
    

    python shell

    ./bin/pyspark
    
    16.Spark集群测试

    • 本地单机模式执行
    ./bin/run-example org.apache.spark.examples.SparkPi
    
    • 本地单机并行(多线程)模式执行
    MASTER=local[2] ./bin/run-example org.apache.spark.examples.SparkPi
    
    • 在spark集群上运行程序
    MASTER=spark://host169:7077 ./bin/run-example org.apache.spark.examples.SparkPi
    

    相关文章

      网友评论

        本文标题:Spark环境部署

        本文链接:https://www.haomeiwen.com/subject/cefkdttx.html