hadoop 安装
基于mac os
创建hadoop账号 我用登录电脑的账号启动 这一步略
配置ssh
cd ~/.ssh/ # 若没有该目录,请先执行一次ssh localhost
ssh-keygen -t rsa # 会有提示,都按回车就可以
cat ./id_rsa.pub >> ./authorized_keys # 加入授权
如果没有ssh 需要先安装ssh
成功标志
ssh localhos #能直接登录
下载hadoop
下载地址: https://mirrors.cnnic.cn/apache/hadoop/common/
下载hadoop 因为我安装spark 依赖hadoop2.7.x 所以我下载的是 2.7.7版本
到安装目录 比如我安装到/usr/local下
cd /usr/local/
sudo tar -zxvf ~/Downloads/hadoop-2.7.7.tar.gz #使用sudo 是为了权限
伪分布式配置
配置core-site.xml
cd /usr/local/hadoop-2.7.7
vim etc/hadoop/core-site.xml
添加如下配置
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:xxxx/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
</configuration>
配置环境变量
export HADOOP_HOME=/usr/local/hadoop-2.7.7
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop-2.7.7/lib"
export PATH=$PATH:$HADOOP_HOME/bin
同时需要配置java的环境变量
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
export PATH=$PATH:$JAVA_HOME/bin
需要配置JAVA_HOME 路劲不然hadoop找不到java路劲
注意配置环境变量时目录的路劲根据自己安装路劲配置
集群中提示java_home找不到
修改 /usr/local/hadoop-2.7.7/etc/hadoop/hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
#改成绝度路径
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
或者
在/usr/local/hadoop-2.7.7/etc/hadoop/hadoop-env.sh 加入
. /etc/profile #加载本地环境变量
启动hadoop
/usr/local/hadoop-2.7.7/start-all.sh
通过start-all.sh 可知启动的时dfs 和 yarn
启动报错
日志文件无权限
/usr/local/hadoop-2.7.7/ 目录下新建logs文件 并设置成启动hadoop的用户有权限
spark 安装
下载
下载 spark http://spark.apache.org/downloads.html
由于我spark 和 hadoop 单独安装 所以选择 with user provider hadoop版本

解压
cd /usr/local/
sudo tar -zxvf ~/Downloads/spark-2.4.5-bin-without-hadoop.tgz
配置
cd /usr/local/spark-2.4.5-bin-without-hadoop/conf
cp spark-env.sh.template spark-env.sh
添加配置
#设置 hadoop classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
#配置hadoop yarn
export YARN_CONF_DIR=$HADOOP_HOME
export PYSPARK_PYTHON=/Users/pangxianhai/opt/anaconda3/bin/python3.7
如果使用python环境开发注意python环境配置 不然用系统默认python环境 可能出现找不到pip下载到的三方库 并且python版本不一致的问题
注意根据我的spark版本 不能用python3.8版本
使用python3.8版本报错 如下
Traceback (most recent call last):
File "/usr/local/spark-2.4.5-bin-without-hadoop/examples/src/main/python/pi.py", line 24, in <module>
from pyspark.sql import SparkSession
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/serializers.py", line 72, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module>
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
TypeError: an integer is required (got type bytes)
配置环境变量
export SPARK_HOME=/usr/local/spark-2.4.5-bin-without-hadoop
export PATH=$SPARK_HOME/bin:$PATH
配置环境变量方便使用命令
运行实例
cd /usr/local/spark-2.4.5-bin-without-hadoop/
bin/run-example SparkPi
运行结果

运行程序
spark-submit --master local --class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.11-2.4.5.jar
Master URL可以是以下任一种形式:
- local 使用一个Worker线程本地化运行SPARK(完全不并行)
- local[*] 使用逻辑CPU个数数量的线程来本地化运行Spark
- local[K] 使用K个Worker线程本地化运行Spark(理想情况下,K应该根据运行机器的CPU核数设定)
- spark://HOST:PORT 连接到指定的Spark standalone master。默认端口是7077.
- yarn-client 以客户端模式连接YARN集群。集群的位置可以在HADOOP_CONF_DIR 环境变量中找到。
- yarn-cluster 以集群模式连接YARN集群。集群的位置可以在HADOOP_CONF_DIR 环境变量中找到。
- mesos://HOST:PORT 连接到指定的Mesos集群。默认接口是5050。
运行python
spark-submit --master local examples/src/main/python/pi.py
网友评论