美文网首页
spark入门程序 word count

spark入门程序 word count

作者: mumu_cola | 来源:发表于2018-12-25 20:43 被阅读0次

    本文总结了spark中的hello world—word count的开发流程。

    spark 支持的开发语言有scala,java, python,下面用java语言进行word count程序开发。java在1.8版本以后支持lambda表达式,这大大缩减了开发时间。具体lambda表达式使用可以参考文章 函数式编程(一) lambda、FunctionalInterface、Method Reference

    开发工具:IDEA,maven,JDK1.8

    1.在IDEA中新建一个maven project(Project SDK使用java 1.8以上版本),本例项目名称叫count。
    2.打开maven的pom.xml文件,在version标签下面添加如下代码,由于本文的count程序使用javaRDD,所以添加spark core即可。

        <properties>
            <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        </properties>
        <dependencies>
            <dependency>
                <groupId>com.thoughtworks.paranamer</groupId>
                <artifactId>paranamer</artifactId>
                <version>2.8</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>2.4.0</version>
            </dependency>
        </dependencies>
    

    3.编写word count代码,其中 e:/word_count.txt 为待统计文字文件,程序使用local模式进行部署运行。

    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.JavaPairRDD;
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.JavaSparkContext;
    import scala.Tuple2;
    
    import java.util.Arrays;
    import java.util.List;
    
    public class WordCount {
        public static void main(String[] args) {
            SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local");
            JavaSparkContext sc = new JavaSparkContext(conf);
            JavaRDD<String> textFile = sc.textFile("e:/word_count.txt");
            JavaPairRDD<String, Integer> counts = textFile
                    .flatMap(s -> Arrays.asList(s.split(" ")).iterator())
                    .mapToPair(word -> new Tuple2<>(word, 1))
                    .reduceByKey((a, b) -> a + b);
            List<Tuple2<String, Integer>> countList = counts.collect();
            countList.forEach(System.out::println);
        }
    }
    

    4.运行部分结果:

    (touched,1)
    (voices,1)
    (forming.,1)
    (Because,1)
    (it,1)
    (its,2)
    (writing,1)
    (People,3)
    (old,1)
    (naked,1)
    (Hear,1)
    (Take,1)
    (arms,1)
    (fell,,1)
    (cobblestone,,1)
    (neon,2)
    (you,,1)
    ...
    

    测试用例使用电影《毕业生》的主题曲The sound of silence

    Hello darkness, my old friend,
    I've come to talk with you again,
    Because a vision softly creeping,
    Left its seeds while I was sleeping,
    And the vision that was planted in my brain
    Still remains
    Within the sound of silence.
    In restless dreams I walk alone
    Narrow streets of cobblestone,
    'Neath the halo of a street lamp,
    I turned my collar to the cold and damp
    When my eyes were stabbed by the flash of a neon light
    That split the night
    And touched the sound of silence.
    And in the naked light I saw
    Ten thousand people, maybe more.
    People talking without speaking,
    People hearing without listening,
    People writing songs that voices never share
    And no one dared
    Disturb the sound of silence.
    "Fools" said I,"You do not know
    Silence like a cancer grows.
    Hear my words that I might teach you,
    Take my arms that I might reach you."
    But my words like silent raindrops fell,
    And echoed
    In the wells of silence
    And the people bowed and prayed
    To the neon god they made.
    And the sign flashed out its warning,
    In the words that it was forming.
    And the signs said, 'The words of the prophets are written on the subway walls
    And tenement halls.
    And whisper'd in the sounds of silence.

    参考图书:
    Spark快速大数据分析

    相关文章

      网友评论

          本文标题:spark入门程序 word count

          本文链接:https://www.haomeiwen.com/subject/suoflqtx.html