美文网首页
spark 2.3.1入门学习

spark 2.3.1入门学习

作者: 大奇聊数据 | 来源:发表于2019-01-24 22:54 被阅读5次
    1. 基础环境
    vi /etc/hosts
    192.168.74.10  host196
    192.168.74.29  host197
    192.168.74.30  host198
    
    

    安装jdk,zookeeper,hadoop

    1. 安装步骤
    tar -zxvf spark-2.3.2-bin-hadoop2.7.tgz -C /opt/
    cd /opt/spark-2.3.2-bin-hadoop2.7/
    cd conf/
    
    cp spark-env.sh.template spark-env.sh
    vi spark-env.sh
    JAVA_HOME=/usr/local/jdk1.8.0_111
    HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop
    SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=192.168.74.10:2181-Dspark.deploy.zookeeper.dir=/spark"
    
    cp slaves.template slaves
    vi slaves
    host196
    host197
    host198
    
    scp -r spark-2.3.2-bin-hadoop2.7/ host197:/opt/
    scp -r spark-2.3.2-bin-hadoop2.7/ host198:/opt/
    
    
    1. 启动、停止服务
    • 启动
    ssh host196
    sbin/start-all.sh
    
    ssh host197
    sbin/start-master.sh
    
    

    访问:http://host196:8080

    • 停止
    ssh host196
    sbin/stop-all.sh
    
    ssh host197
    sbin/stop-master.sh
    
    
    1. 基本测试

    使用spark-shell来测试

    [root@host198 spark-2.3.2-bin-hadoop2.7]# bin/spark-shell 
    2018-10-23 10:27:08 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://host198:4040
    Spark context available as 'sc' (master = local[*], app id = local-1540261648120).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.3.2
          /_/
    
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> val file=sc.textFile("hdfs://192.168.74.10:9000/opt/hdfs_test/input/words.txt")
    file: org.apache.spark.rdd.RDD[String] = hdfs://192.168.74.10:9000/opt/hdfs_test/input/words.txt MapPartitionsRDD[1] at textFile at <console>:24
    
    scala> val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
    rdd: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25
    
    scala> rdd.collect()
    res0: Array[(String, Int)] = Array((zhangsan,1), (wangwu,1), (hello,3), (lisi,1))
    
    scala> rdd.foreach(println)
    (zhangsan,1)
    (wangwu,1)
    (hello,3)
    (lisi,1)
    
    
    1. FAQ
    2. 参考资料

    相关文章

      网友评论

          本文标题:spark 2.3.1入门学习

          本文链接:https://www.haomeiwen.com/subject/izlpjqtx.html