美文网首页
19.Hadoop:Spark Helloworld实验

19.Hadoop:Spark Helloworld实验

作者: 負笈在线 | 来源:发表于2020-07-12 12:43 被阅读0次

    本节主要内容:

    Spark Helloworld实验:

    运用Spark,计算词语出现的次数

    wordcount数据准备

           # echo "Hello World Bye World" > /file0

           # echo "Hello Hadoop Goodbye Hadoop" > /file1

           # sudo -u hdfs hdfs dfs -mkdir -p /usr/spark/wordcount/input

           # sudo -u hdfs hdfs dfs -put file* /user/spark/wordcount/input

           # sudo -u hdfs hdfs dfs -chmod 1777 /user/spark/wordcount/input

           # sudo -u hdfs hdfs dfs -chown -R spark:spark /user/spark/wordcount/input

    进入spark-shell运行脚本

           ## sudo -u spark spark-shell

    Setting default log level to "WARN".

    scala>

    scala> val file = sc.textFile("hdfs://cluster1/user/spark/wordcount/input") 定义变量file,指向源文件地址

    scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _) 调用file的flatMap方法,将每一行用空格分割,并取出单词,然后根据单词统计,并累加

    scala> counts.saveAsTextFile("hdfs://cluster1/user/spark/wordcount/output")  定义文件的输出

    在pig中查看

           # sudo -u hdfs pig

    grunt> ls

    hdfs://cluster1/user/spark/wordcount/output/_SUCCESS<r 3> 0

    hdfs://cluster1/user/spark/wordcount/output/part-00000<r 3> 28

    hdfs://cluster1/user/spark/wordcount/output/part-00001<r 3> 23

    grunt> cat part-00000

    (Bye,1)

    (Hello,2)

    (World,2)

    grunt> cat part-00001

    (Goodbye,1)

    (Hadoop,2)

    grunt>

    相关文章

      网友评论

          本文标题:19.Hadoop:Spark Helloworld实验

          本文链接:https://www.haomeiwen.com/subject/xpqwqktx.html