Spark实战(1)_Spark2.0环境搭建

作者: padluo | 来源:发表于2018-02-28 13:19 被阅读32次

    环境安装软件准备

    CentOS-7-x86_64-Everything-1611.iso

    spark-2.0.1-bin-hadoop2.7.tgz

    hadoop-2.7.3.tar.gz

    scala-2.11.8.tgz

    jdk-8u91-linux-x64.tar.gz

    建立Linux虚拟机(全节点)

    客户机操作系统:CentOS-7-x86_64。

    网络和主机名设置:

    常规选项卡:可用时自动连接到这个网络,打勾。

    IPv4选项卡设置如下:

    hostname Address Netmask Gateway
    sparkmaster 192.168.169.221 255.255.255.0
    sparknode1 192.168.169.222 255.255.255.0
    sparknode2 192.168.169.223 255.255.255.0

    安装类型:最小安装

    创建用户(全节点)

    su root
    useradd spark
    passwd spark
    su spark
    cd ~
    pwd
    mkdir softwares
    

    修改语系为英文语系(全节点)

    # 显示目前所支持的语系
    locale
    
    LANG=en_US.utf8
    export LC_ALL=en_US.utf8
    
    # 修改系统预设
    cat /etc/locale.conf
    
    LANG=en_US.utf8
    

    修改hostname(全节点)

    vi /etc/hostname
    
    # 192.168.169.221
    sparkmaster
    # 192.168.169.222
    sparknode1
    # 192.168.169.223
    sparknode2
    

    修改hosts(全节点)

    su root
    vi /etc/hosts
    
    192.168.169.221 sparkmaster
    192.168.169.222 sparknode1
    192.168.169.223 sparknode2
    

    为了使集群能够用域名在Windows下访问,Windows下配置hosts的路径为:C:\Windows\System32\drivers\etc。

    配置固定IP(全节点)

    vi /etc/sysconfig/network-scripts/ifcfg-ens33
    
    # BOOTPROTO=dhcp
    BOOTPROTO=static
    IPADDR0=xxx
    GATEWAY0=xxx
    NETMASK=xxx
    DNS1=xxx
    
    systemctl restart network
    

    关闭防火墙(全节点)

    systemctl status firewalld.service
    
    systemctl stop firewalld.service
    systemctl disable firewalld.service
    

    配置无密钥登录(全节点)

    su spark
    cd ~
    
    • ssh-keygen -t rsa -P ''
    • 将每个节点生成的id_rsa.pub里面的内容拷贝出来
    • 将所有节点拷贝好的公钥一起拷贝到每个节点用户家目录下的.sshauthorized_keys这个文件中
    • 每个节点的authorized_keys这个文件访问权限必须改成600,chmod 600 authoried_keys

    上传软件(master节点)

    把环境安装准备的软件jdk、Hadoop、Spark、Scala上传到sparkmaster:/home/spark/softwares

    安装jdk(master节点)

    tar -zxvf jdk-8u91-linux-x64.tar.gz
    vi ~/.bashrc
    
    export JAVA_HOME=/home/spark/softwares/jdk1.8.0_91
    export PATH=$PATH:$JAVA_HOME/bin
    
    source ~/.bashrc
    which java
    

    安装Scala(master节点)

    tar -zxvf scala-2.11.8.tgz
    vi ~/.bashrc
    
    export SCALA_HOME=/home/spark/softwares/scala-2.11.8
    export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin
    
    source ~/.bashrc
    which scala
    

    安装Hadoop(master节点)

    tar -zxvf hadoop-2.7.3.tar.gz
    

    Hadoop配置文件所在目录:/home/spark/softwares/hadoop-2.7.3/etc/hadoop

    core-site.xml

    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://sparkmaster:8082</value>
    </property>
    

    hdfs-site.xml

    <property>
        <name>dfs.name.dir</name>
        <value>file:/home/spark/softwares/hadoop-2.7.3/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>file:/home/spark/softwares/hadoop-2.7.3/hdfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>sparkmaster:9001</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    

    masters

    sparkmaster
    

    slaves

    sparkmaster
    sparknode1
    sparknode2
    

    hadoop-env.sh

    export JAVA_HOME=${JAVA_HOME}
    

    环境变量

    vi ~/.bashrc
    
    export HADOOP_HOME=/home/spark/softwares/hadoop-2.7.3
    export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin
    
    source ~/.bashrc
    

    安装Spark(master节点)

    tar -zxvf spark-2.0.1-bin-hadoop2.7.tgz
    
    # /home/spark/softwares/spark-2.0.1-bin-hadoop2.7/conf
    
    vi slaves
    
    sparkmaster
    sparknode1
    sparknode2
    
    vi spark-env.sh
    
    export SPARK_HOME=$SPARK_HOME
    export HADOOP_HOME=$HADOOP_HOME
    export MASTER=spark://sparkmaster:7077
    export SCALE_HOME=$SCALE_HOME
    export SPARK_MASTER_IP=sparkmaster
    
    vi ~/.bashrc
    
    export SPARK_HOME=/home/spark/softwares/spark-2.0.1-bin-hadoop2.7
    export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin
    
    source ~/.bashrc
    

    搭建本地yum源(local方式)(master节点)

    挂载iso镜像文件,拷贝文件内容

    su root
    mkdir -p /mnt/CentOS /mnt/dvd
    mount /dev/cdrom /mnt/dvd
    df -h
    cp -av /mnt/dvd/* /mnt/CentOS
    umount /mnt/dvd
    

    备份原有yum配置文件

    cd /etc/yum.repos.d
    rename .repo .repo.bak *.repo
    

    新建yum配置文件

    vi /etc/yum.repos.d/local.repo
    
    [local]
    name=CentOS-$releasever - Local
    baseurl=file:///mnt/CentOS
    enabled=1
    gpgcheck=0
    
    # 验证
    yum list | grep mysql
    

    搭建本地yum源(http方式)(master节点)

    启动httpd服务

    # 验证是否安装httpd服务
    rpm -qa|grep httpd
    # yum install -y httpd
    yum install -y httpd
    # 启动httpd服务
    # service httpd start
    systemctl status httpd.service
    systemctl start httpd.service
    # 设置httpd服务开机自启动
    # chkconfig httpd on
    systemctl is-enabled httpd.service
    systemctl enable httpd.service
    

    安装yum源

    # 在/var/www/html/下创建文件夹CentOS7
    mkdir -p /var/www/html/CentOS7
    
    # 将iso文件中的内容copy到CentOS7
    # cp -av /mnt/CentOS/* /var/www/html/CentOS7/
    # rm -rf /mnt/CentOS/*
    mv /mnt/CentOS/* /var/www/html/CentOS7/
    

    利用ISO镜像,yum源搭建OK。浏览器验证访问:

    http://sparkmaster/CentOS7/

    使用yum源

    # 备份原有的repo文件
    # mkdir -p /etc/yum.repos.d/repo.bak
    # cd /etc/yum.repos.d/
    # cp *.repo *.repo.bak repo.bak/
    # rm -rf *.repo *.repo.bak
    
    cd /etc/yum.repos.d/
    # 新建文件CentOS-http.repo
    vi CentOS-http.repo
    
    [http]
    name=CentOS-$releasever - http
    baseurl=http://sparkmaster:80/CentOS7/
    enabled=1
    gpgcheck=1
    gpgkey=http://sparkmaster:80/CentOS7/RPM-GPG-KEY-CentOS-7
    
    # 把前面搭建的本地yum源禁用,设置local.repo中的enabled=0
    
    # 更新yum源
    yum clean
    yum repolist
    

    集群yum源配置(http方式)(全节点)

    # sparknode1/sparknode2
    cd /etc/yum.repos.d
    rename .repo .repo.bak *.repo
    
    # sparkmaster
    scp /etc/yum.repos.d/*.repo sparknode1:/etc/yum.repos.d/
    scp /etc/yum.repos.d/*.repo sparknode2:/etc/yum.repos.d/
    

    异步传输工具(全节点)

    利用异步传输工具进行master节点下/home/spark/softwares所安装软件jdk、Hadoop、Spark、Scala的同步。

    rpm -qa | grep rsync
    yum list | grep rsync
    yum install -y rsync
    
    vi sync_tools.sh
    
    echo "-----begin to sync jobs to other workplat-----"
    SERVER_LIST='sparknode1 sparknode2'
    for SERVER in $SERVER_LIST
    do
        rsync -avz ./* $SERVER:/home/spark/softwares
    done
    echo "-----sync jobs is done-----"
    
    cd ~/softwares
    chmod 700 sync_tools.sh
    ./sync_tools.sh
    

    环境变量配置同步(全节点)

    # sparknode1/sparknode2
    mv ~/.bashrc ~/.bashrc.bak
    
    # sparkmaster
    su spark
    scp ~/.bashrc sparknode1:~/.bashrc
    scp ~/.bashrc sparknode2:~/.bashrc
    
    # sparknode1/sparknode2
    source ~/.bashrc
    

    启动Spark及验证

    cd $SPRAK_HOME
    cd sbin
    ./stop-all.sh
    ./start-all.sh
    jps
    

    验证:

    http://sparkmaster:8080/

    启动HDFS及验证

    cd $HADOOP_HOME
    # 格式化
    hadoop namenode -format
    cd ../sbin
    ./stop-all.sh
    ./start-dfs.sh
    jps
    

    验证:

    http://sparkmaster:50070

    至此,Spark2.0环境搭建结束。


    您可能还想看

    Hadoop/CDH

    Hadoop实战(1)_阿里云搭建Hadoop2.x的伪分布式环境

    Hadoop实战(2)_虚拟机搭建Hadoop的全分布模式

    Hadoop实战(3)_虚拟机搭建CDH的全分布模式

    Hadoop实战(4)_Hadoop的集群管理和资源分配

    Hadoop实战(5)_Hadoop的运维经验

    Hadoop实战(6)_搭建Apache Hadoop的Eclipse开发环境

    Hadoop实战(7)_Apache Hadoop安装和配置Hue

    Hadoop实战(8)_CDH添加Hive服务及Hive基础

    Hadoop实战(9)_Hive进阶及UDF开发

    Hadoop实战(10)_Sqoop import与抽取框架封装


    微信公众号「数据分析」,分享数据科学家的自我修养,既然遇见,不如一起成长。

    数据分析

    读者交流电报群

    https://t.me/sspadluo


    知识星球交流群

    知识星球读者交流群

    相关文章

      网友评论

        本文标题:Spark实战(1)_Spark2.0环境搭建

        本文链接:https://www.haomeiwen.com/subject/ahvbxftx.html