美文网首页
【spark开发】CDH5.14.2环境配置pyspark(sp

【spark开发】CDH5.14.2环境配置pyspark(sp

作者: 粮忆雨 | 来源:发表于2019-12-11 14:46 被阅读0次

    1、连接mysql需要添加mysql驱动包到SPARK_HOME/jars目录下

    cp mysql-connector-java-5.1.43.jar /opt/cloudera/parcels/SPARK2/lib/spark2/jars/
    

    2、#spark2-conf/spark-env.sh 的 Spark 2 客户端高级配置代码段(安全阀)添加如下配置

    for loop in `ls /opt/cloudera/parcels/CDH/jars/hbase-*.jar`;do
       export SPARK_DIST_CLASSPATH=${loop}:${SPARK_DIST_CLASSPATH}
    done
    \#加载org.apache.spark.examples.pythonconverters...包
    for loop in `ls /opt/cloudera/parcels/CDH/lib/spark/lib/spark-examples-*.jar`;do
       export SPARK_DIST_CLASSPATH=${loop}:${SPARK_DIST_CLASSPATH}
    done
    \#加载hive整合hbase的包
    for loop in `ls /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler-*.jar`;do
       export SPARK_DIST_CLASSPATH=${loop}:${SPARK_DIST_CLASSPATH}
    done
    #加载HBase的配置到Spark2的环境变量中
    export HADOOP_CONF_DIR=${HADOOP_CONF_DIR}:/etc/hbase/conf/
    

    3、安装python环境

    sh Anaconda3-2019.10-Linux-x86_64.sh
    配置环境变量

    export PYSPARK_PYTHON=/opt/apps/anaconda3/bin/python3
    export PYSPARK_DRIVER_PYTHON=/opt/apps/anaconda3/bin/python3
    

    相关文章

      网友评论

          本文标题:【spark开发】CDH5.14.2环境配置pyspark(sp

          本文链接:https://www.haomeiwen.com/subject/jjabgctx.html