美文网首页
Spark中使用Scala实现WordCount业务

Spark中使用Scala实现WordCount业务

作者: 羋学僧 | 来源:发表于2020-09-28 11:29 被阅读0次

    Spark中使用Scala实现WordCount业务

    创建一个Project


    sbt选择1.0.4

    Scala选择2.11.8

    配置路径

    Project Sources


    Dependencies

    新建object

    MyScalaWordCount.scala

    本地模式

    object MyScalaWordCount {
    
      def main(args: Array[String]): Unit = {
    
        val conf = new SparkConf().setAppName("MyScalaWordCount").setMaster("local");
    
        //创建一个SparkContext对象
        val sc = new SparkContext(conf)
    
        //执行WordCount
        val result = sc.textFile("hdfs://bigdata02:9000/wordcount.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
        //打印在屏幕上
        result.foreach(println)
        //释放资源
        sc.stop()
    }
    
    }
    
    

    导出Jar包在服务器上运行

    MyScalaWordCount.scala

    生成jar包

    object MyScalaWordCount {
    
      def main(args: Array[String]): Unit = {
    
        val conf = new SparkConf().setAppName("MyScalaWordCount");
    
        //创建一个SparkContext对象
        val sc = new SparkContext(conf)
    
        //执行WordCount
        val result = sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)
        //打印在屏幕上
        result.foreach(println)
        //释放资源
        sc.stop()
    }
    
    }
    
    

    打包操作

    Project Sources


    Artifacts
    详细步骤参考

    Build Artifacts


    导出成功

    上传Jar包到服务器并执行

    cd /home/bigdata/apps/spark-2.1.0-bin-hadoop2.7
    
    ./bin/spark-submit --master spark://bigdata02:7077 --class nx.MyScalaWordCount /home/bigdata/data/SparkScalaWork.jar hdfs://bigdata02:9000/wordcount.txt hdfs://bigdata02:9000/output/spark/wc0928
    
    hdfs dfs -cat /output/spark/wc0928/part-00000
    

    相关文章

      网友评论

          本文标题:Spark中使用Scala实现WordCount业务

          本文链接:https://www.haomeiwen.com/subject/wkikuktx.html