环境安装软件准备
CentOS-7-x86_64-Everything-1611.iso
spark-2.0.1-bin-hadoop2.7.tgz
hadoop-2.7.3.tar.gz
scala-2.11.8.tgz
jdk-8u91-linux-x64.tar.gz
建立Linux虚拟机(全节点)
客户机操作系统:CentOS-7-x86_64。
网络和主机名设置:
常规选项卡:可用时自动连接到这个网络,打勾。
IPv4选项卡设置如下:
hostname | Address | Netmask | Gateway |
---|---|---|---|
sparkmaster | 192.168.169.221 | 255.255.255.0 | |
sparknode1 | 192.168.169.222 | 255.255.255.0 | |
sparknode2 | 192.168.169.223 | 255.255.255.0 |
安装类型:最小安装
创建用户(全节点)
su root
useradd spark
passwd spark
su spark
cd ~
pwd
mkdir softwares
修改语系为英文语系(全节点)
# 显示目前所支持的语系
locale
LANG=en_US.utf8
export LC_ALL=en_US.utf8
# 修改系统预设
cat /etc/locale.conf
LANG=en_US.utf8
修改hostname(全节点)
vi /etc/hostname
# 192.168.169.221
sparkmaster
# 192.168.169.222
sparknode1
# 192.168.169.223
sparknode2
修改hosts(全节点)
su root
vi /etc/hosts
192.168.169.221 sparkmaster
192.168.169.222 sparknode1
192.168.169.223 sparknode2
为了使集群能够用域名在Windows下访问,Windows下配置hosts的路径为:C:\Windows\System32\drivers\etc。
配置固定IP(全节点)
vi /etc/sysconfig/network-scripts/ifcfg-ens33
# BOOTPROTO=dhcp
BOOTPROTO=static
IPADDR0=xxx
GATEWAY0=xxx
NETMASK=xxx
DNS1=xxx
systemctl restart network
关闭防火墙(全节点)
systemctl status firewalld.service
systemctl stop firewalld.service
systemctl disable firewalld.service
配置无密钥登录(全节点)
su spark
cd ~
ssh-keygen -t rsa -P ''
- 将每个节点生成的id_rsa.pub里面的内容拷贝出来
- 将所有节点拷贝好的公钥一起拷贝到每个节点用户家目录下的
.ssh
的authorized_keys
这个文件中 - 每个节点的
authorized_keys
这个文件访问权限必须改成600,chmod 600 authoried_keys
上传软件(master节点)
把环境安装准备的软件jdk、Hadoop、Spark、Scala上传到sparkmaster:/home/spark/softwares
安装jdk(master节点)
tar -zxvf jdk-8u91-linux-x64.tar.gz
vi ~/.bashrc
export JAVA_HOME=/home/spark/softwares/jdk1.8.0_91
export PATH=$PATH:$JAVA_HOME/bin
source ~/.bashrc
which java
安装Scala(master节点)
tar -zxvf scala-2.11.8.tgz
vi ~/.bashrc
export SCALA_HOME=/home/spark/softwares/scala-2.11.8
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin
source ~/.bashrc
which scala
安装Hadoop(master节点)
tar -zxvf hadoop-2.7.3.tar.gz
Hadoop配置文件所在目录:/home/spark/softwares/hadoop-2.7.3/etc/hadoop
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://sparkmaster:8082</value>
</property>
hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>file:/home/spark/softwares/hadoop-2.7.3/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:/home/spark/softwares/hadoop-2.7.3/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>sparkmaster:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
masters
sparkmaster
slaves
sparkmaster
sparknode1
sparknode2
hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
环境变量
vi ~/.bashrc
export HADOOP_HOME=/home/spark/softwares/hadoop-2.7.3
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin
source ~/.bashrc
安装Spark(master节点)
tar -zxvf spark-2.0.1-bin-hadoop2.7.tgz
# /home/spark/softwares/spark-2.0.1-bin-hadoop2.7/conf
vi slaves
sparkmaster
sparknode1
sparknode2
vi spark-env.sh
export SPARK_HOME=$SPARK_HOME
export HADOOP_HOME=$HADOOP_HOME
export MASTER=spark://sparkmaster:7077
export SCALE_HOME=$SCALE_HOME
export SPARK_MASTER_IP=sparkmaster
vi ~/.bashrc
export SPARK_HOME=/home/spark/softwares/spark-2.0.1-bin-hadoop2.7
export PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin
source ~/.bashrc
搭建本地yum源(local方式)(master节点)
挂载iso镜像文件,拷贝文件内容
su root
mkdir -p /mnt/CentOS /mnt/dvd
mount /dev/cdrom /mnt/dvd
df -h
cp -av /mnt/dvd/* /mnt/CentOS
umount /mnt/dvd
备份原有yum配置文件,
cd /etc/yum.repos.d
rename .repo .repo.bak *.repo
新建yum配置文件
vi /etc/yum.repos.d/local.repo
[local]
name=CentOS-$releasever - Local
baseurl=file:///mnt/CentOS
enabled=1
gpgcheck=0
# 验证
yum list | grep mysql
搭建本地yum源(http方式)(master节点)
启动httpd服务
# 验证是否安装httpd服务
rpm -qa|grep httpd
# yum install -y httpd
yum install -y httpd
# 启动httpd服务
# service httpd start
systemctl status httpd.service
systemctl start httpd.service
# 设置httpd服务开机自启动
# chkconfig httpd on
systemctl is-enabled httpd.service
systemctl enable httpd.service
安装yum源
# 在/var/www/html/下创建文件夹CentOS7
mkdir -p /var/www/html/CentOS7
# 将iso文件中的内容copy到CentOS7
# cp -av /mnt/CentOS/* /var/www/html/CentOS7/
# rm -rf /mnt/CentOS/*
mv /mnt/CentOS/* /var/www/html/CentOS7/
利用ISO镜像,yum源搭建OK。浏览器验证访问:
使用yum源
# 备份原有的repo文件
# mkdir -p /etc/yum.repos.d/repo.bak
# cd /etc/yum.repos.d/
# cp *.repo *.repo.bak repo.bak/
# rm -rf *.repo *.repo.bak
cd /etc/yum.repos.d/
# 新建文件CentOS-http.repo
vi CentOS-http.repo
[http]
name=CentOS-$releasever - http
baseurl=http://sparkmaster:80/CentOS7/
enabled=1
gpgcheck=1
gpgkey=http://sparkmaster:80/CentOS7/RPM-GPG-KEY-CentOS-7
# 把前面搭建的本地yum源禁用,设置local.repo中的enabled=0
# 更新yum源
yum clean
yum repolist
集群yum源配置(http方式)(全节点)
# sparknode1/sparknode2
cd /etc/yum.repos.d
rename .repo .repo.bak *.repo
# sparkmaster
scp /etc/yum.repos.d/*.repo sparknode1:/etc/yum.repos.d/
scp /etc/yum.repos.d/*.repo sparknode2:/etc/yum.repos.d/
异步传输工具(全节点)
利用异步传输工具进行master节点下/home/spark/softwares所安装软件jdk、Hadoop、Spark、Scala的同步。
rpm -qa | grep rsync
yum list | grep rsync
yum install -y rsync
vi sync_tools.sh
echo "-----begin to sync jobs to other workplat-----"
SERVER_LIST='sparknode1 sparknode2'
for SERVER in $SERVER_LIST
do
rsync -avz ./* $SERVER:/home/spark/softwares
done
echo "-----sync jobs is done-----"
cd ~/softwares
chmod 700 sync_tools.sh
./sync_tools.sh
环境变量配置同步(全节点)
# sparknode1/sparknode2
mv ~/.bashrc ~/.bashrc.bak
# sparkmaster
su spark
scp ~/.bashrc sparknode1:~/.bashrc
scp ~/.bashrc sparknode2:~/.bashrc
# sparknode1/sparknode2
source ~/.bashrc
启动Spark及验证
cd $SPRAK_HOME
cd sbin
./stop-all.sh
./start-all.sh
jps
验证:
启动HDFS及验证
cd $HADOOP_HOME
# 格式化
hadoop namenode -format
cd ../sbin
./stop-all.sh
./start-dfs.sh
jps
验证:
至此,Spark2.0环境搭建结束。
您可能还想看
Hadoop/CDH
Hadoop实战(1)_阿里云搭建Hadoop2.x的伪分布式环境
Hadoop实战(6)_搭建Apache Hadoop的Eclipse开发环境
Hadoop实战(7)_Apache Hadoop安装和配置Hue
Hadoop实战(8)_CDH添加Hive服务及Hive基础
Hadoop实战(10)_Sqoop import与抽取框架封装
微信公众号「数据分析」,分享数据科学家的自我修养,既然遇见,不如一起成长。
数据分析读者交流电报群
知识星球交流群
知识星球读者交流群
网友评论