Spark01 hadoop&spark环境安装

作者: 山高月更阔 | 来源:发表于2020-05-17 18:45 被阅读0次

    hadoop 安装

    基于mac os

    创建hadoop账号 我用登录电脑的账号启动 这一步略

    配置ssh

    cd ~/.ssh/                     # 若没有该目录,请先执行一次ssh localhost
    ssh-keygen -t rsa              # 会有提示,都按回车就可以
    cat ./id_rsa.pub >> ./authorized_keys  # 加入授权
    

    如果没有ssh 需要先安装ssh
    成功标志

    ssh localhos #能直接登录
    

    下载hadoop

    下载地址: https://mirrors.cnnic.cn/apache/hadoop/common/
    下载hadoop 因为我安装spark 依赖hadoop2.7.x 所以我下载的是 2.7.7版本
    到安装目录 比如我安装到/usr/local下

    cd /usr/local/
    sudo tar -zxvf ~/Downloads/hadoop-2.7.7.tar.gz  #使用sudo 是为了权限
    

    伪分布式配置

    配置core-site.xml

    cd /usr/local/hadoop-2.7.7
    vim etc/hadoop/core-site.xml 
    

    添加如下配置

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>file:xxxx/hadoop/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
        <property>
            <name>hadoop.native.lib</name>
            <value>false</value>
            <description>Should native hadoop libraries, if present, be used.</description>
        </property>
    </configuration>
    

    配置环境变量

    export HADOOP_HOME=/usr/local/hadoop-2.7.7
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop-2.7.7/lib"
    export PATH=$PATH:$HADOOP_HOME/bin
    

    同时需要配置java的环境变量

    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
    export PATH=$PATH:$JAVA_HOME/bin
    

    需要配置JAVA_HOME 路劲不然hadoop找不到java路劲
    注意配置环境变量时目录的路劲根据自己安装路劲配置

    集群中提示java_home找不到

    修改 /usr/local/hadoop-2.7.7/etc/hadoop/hadoop-env.sh

    export JAVA_HOME=${JAVA_HOME}
    #改成绝度路径
    export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
    

    或者
    在/usr/local/hadoop-2.7.7/etc/hadoop/hadoop-env.sh 加入

    . /etc/profile #加载本地环境变量
    

    启动hadoop

    /usr/local/hadoop-2.7.7/start-all.sh 
    

    通过start-all.sh 可知启动的时dfs 和 yarn

    启动报错

    日志文件无权限

    /usr/local/hadoop-2.7.7/ 目录下新建logs文件 并设置成启动hadoop的用户有权限

    spark 安装

    下载

    下载 spark http://spark.apache.org/downloads.html
    由于我spark 和 hadoop 单独安装 所以选择 with user provider hadoop版本

    image.png

    解压

    cd /usr/local/
    sudo tar -zxvf ~/Downloads/spark-2.4.5-bin-without-hadoop.tgz 
    

    配置

    cd /usr/local/spark-2.4.5-bin-without-hadoop/conf
    cp  spark-env.sh.template  spark-env.sh
    

    添加配置

    #设置 hadoop classpath
    export SPARK_DIST_CLASSPATH=$(hadoop classpath)
    
    #配置hadoop yarn
    export YARN_CONF_DIR=$HADOOP_HOME
    export PYSPARK_PYTHON=/Users/pangxianhai/opt/anaconda3/bin/python3.7
    

    如果使用python环境开发注意python环境配置 不然用系统默认python环境 可能出现找不到pip下载到的三方库 并且python版本不一致的问题
    注意根据我的spark版本 不能用python3.8版本
    使用python3.8版本报错 如下

    Traceback (most recent call last):
      File "/usr/local/spark-2.4.5-bin-without-hadoop/examples/src/main/python/pi.py", line 24, in <module>
        from pyspark.sql import SparkSession
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
      File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
      File "<frozen zipimport>", line 259, in load_module
      File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
      File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
      File "<frozen zipimport>", line 259, in load_module
      File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
      File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
      File "<frozen zipimport>", line 259, in load_module
      File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
      File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
      File "<frozen zipimport>", line 259, in load_module
      File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/serializers.py", line 72, in <module>
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
      File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
      File "<frozen zipimport>", line 259, in load_module
      File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module>
      File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
    TypeError: an integer is required (got type bytes)
    
    

    配置环境变量

    export SPARK_HOME=/usr/local/spark-2.4.5-bin-without-hadoop
    export PATH=$SPARK_HOME/bin:$PATH
    

    配置环境变量方便使用命令

    运行实例

    cd /usr/local/spark-2.4.5-bin-without-hadoop/
    bin/run-example SparkPi
    

    运行结果


    image.png

    运行程序

     spark-submit --master local --class org.apache.spark.examples.SparkPi  ./examples/jars/spark-examples_2.11-2.4.5.jar 
    
    

    Master URL可以是以下任一种形式:

    • local 使用一个Worker线程本地化运行SPARK(完全不并行)
    • local[*] 使用逻辑CPU个数数量的线程来本地化运行Spark
    • local[K] 使用K个Worker线程本地化运行Spark(理想情况下,K应该根据运行机器的CPU核数设定)
    • spark://HOST:PORT 连接到指定的Spark standalone master。默认端口是7077.
    • yarn-client 以客户端模式连接YARN集群。集群的位置可以在HADOOP_CONF_DIR 环境变量中找到。
    • yarn-cluster 以集群模式连接YARN集群。集群的位置可以在HADOOP_CONF_DIR 环境变量中找到。
    • mesos://HOST:PORT 连接到指定的Mesos集群。默认接口是5050。

    运行python

     spark-submit --master local examples/src/main/python/pi.py 
    

    相关文章

      网友评论

        本文标题:Spark01 hadoop&spark环境安装

        本文链接:https://www.haomeiwen.com/subject/uugsohtx.html