美文网首页
18.Hadoop:Spark环境部署

18.Hadoop:Spark环境部署

作者: 負笈在线 | 来源:发表于2020-07-11 10:26 被阅读0次

    本节主要内容:

    Spark环境部署

    Spark拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是--Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。

    spark5个组件

    spark-core: spark核心包

    spark-worker: spark-worker用的脚本

    spark-master: spark-master用的脚本

    spark-python: Spark的Python客户端

    spark-history-server: 任务历史服务

    1.系统环境:

    OS:CentOS Linux release 7.5.1804 (Core)

    CPU:2核心

    Memory:1GB

    运行用户:root

    JDK版本:1.8.0_252

    Hadoop版本:cdh5.16.2

    2.集群各节点角色规划为:

    172.26.37.245 node1.hadoop.com---->namenode,zookeeper,journalnode,hadoop-hdfs-zkfc,resourcenode,historyserver,hbase,hbase-master,hive,hive-metastore,hive-server2,hive-hbase,sqoop,impala,impala-server,impala-state-store,impala-catalog,pig,spark-core,spark-master,spark-worker,spark-python

    172.26.37.246 node2.hadoop.com---->datanode,zookeeper,journalnode,nodemanager,hadoop-client,mapreduce,hbase-regionserver,impala,impala-server,hive,spark-core,spark-worker,spark-history-server,spark-python

    172.26.37.247  node3.hadoop.com---->datanode,nodemanager,hadoop-client,mapreduce,hive,mysql-server,impala,impala-server,

    172.26.37.248  node4.hadoop.com---->namenode,zookeeper,journalnode,hadoop-hdfs-zkfc,hive,hive-server2,impala-shell

    3.环境说明:

    本次追加部署

    172.26.37.245 node1.hadoop.com---->spark-core,spark-master,spark-worker,spark-python

    172.26.37.246 node2.hadoop.com---->spark-core,spark-worker,spark-history-server,spark-python

    一.安装

    node1节点

               # yum install -y spark-core spark-master spark-worker spark-python

    node2节点

               # yum install -y spark-core spark-worker spark-history-server spark-python

    二.配置

    node1、node2节点

               # cp -p /etc/spark/conf/spark-env.sh /etc/spark/conf/spark-env.sh.20200705

               # vi /etc/spark/conf/spark-env.sh

    插入以下内容

    export STANDALONE_SPARK_MASTER_HOST=‘node1.hadoop.com‘  #注意,这里要改为单引号。

    Spark History Server需要的hdfs文件夹 /user/spark/applicationHistory/

               # sudo -u hdfs hadoop fs -mkdir -p /user/spark/applicationHistory

               # sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark

               # sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory

    node1、node2节点

    在spark worker端,修改/etc/spark/conf/spark-defaults.conf

               # cp -p /etc/spark/conf/spark-defaults.conf /etc/spark/conf/spark-defaults.conf.20200705

               # vi /etc/spark/conf/spark-defaults.conf

    插入以下内容:

    spark.eventLog.dir  hdfs://cluser1/user/spark/applicationHistory

    spark.eventLog.enabled=true

    node1、node2节点复制hdfs-site.xml到/etc/spark/conf下

               # cp /etc/hadoop/conf/hdfs-site.xml /etc/spark/conf/

    三.启动Spark

    Node1节点

    # service spark-master start

    # service spark-master status

    # service spark-worker start

    # service spark-worker status

    Node2节点

    # service spark-worker start

    # service spark-worker status

    # service spark-history-server start

    # service spark-history-server status

    打开浏览器访问 http://172.26.37.245:18080 可以看到Spark的管理界面

    使用 spark-shell 命令进入spark shell

    # sudo -u spark spark-shell

    Setting default log level to "WARN".

    scala>

    相关文章

      网友评论

          本文标题:18.Hadoop:Spark环境部署

          本文链接:https://www.haomeiwen.com/subject/zuqwqktx.html