-
访问spark-3.0.0-bin-hadoop2.7.tgz
选择一个镜像下载spark到本地。 -
解压
tar zxf spark-3.0.0-bin-hadoop2.7.tgz
-
安装jupyter
pip3 install --user jupyterlab notebook
pip3 install --user findspark
- 为jupyter创建新的kernel
cd /usr/local/share/jupyter/kernels/
vi pyspark3/kernel.json
粘贴下面内容到文件中
{
"display_name": "PySpark",
"language": "python",
"argv": [
"/usr/local/bin/python3",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/Users/apple/Downloads/spark-3.0.0-bin-hadoop3.2/",
"PYTHONPATH": "/Users/apple/Downloads/spark-3.0.0-bin-hadoop3.2/python/:/Users/apple/Downloads/spark-3.0.0-bin-hadoop3.2/python/lib/py4j-0.10.7-src.zip",
"PYTHONSTARTUP": "/Users/apple/Downloads/spark-3.0.0-bin-hadoop3.2/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": "--master local[*] --conf spark.executor.cores=1 --conf spark.executor.memory=512m pyspark-shell"
}
}
- 修改pyspark参数
cd /Users/apple/Downloads/spark-3.0.0-bin-hadoop3.2/conf
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
粘贴下面内容到文件中
export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)
export PYSPARK_PYTHON='python3'
export PYSPARK_DRIVER_PYTHON='python3'
export PYSPARK_DRIVER_PYTHON_OPTS=' -m jupyter notebook'
- 启动pyspark
在终端执行
cd /Users/apple/Downloads/spark-3.0.0-bin-hadoop3.2/
bin/pyspark
这时jupyter notebook应该可以启动了,点击New,选择”PySpark“就可以了。
- 试用一下。在notebook中输入
import findspark
findspark.init()
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
sc
应该返回
SparkContext
Spark UI
Version
v3.0.0
Master
local[*]
AppName
Pi
- 谢谢
网友评论