美文网首页
Spark经典案例之求top值

Spark经典案例之求top值

作者: piziyang12138 | 来源:发表于2018-10-19 10:00 被阅读0次

    需求分析
    orderid,userid,payment,productid
    求topN的payment值
    a.txt
    1,9819,100,121
    2,8918,2000,111
    3,2813,1234,22
    4,9100,10,1101
    5,3210,490,111
    6,1298,28,1211
    7,1010,281,90
    8,1818,9000,20

    b.txt
    100,3333,10,100
    101,9321,1000,293
    102,3881,701,20
    103,6791,910,30
    104,8888,11,39

    scala代码

    package ClassicCase
    
    import org.apache.spark.{SparkConf, SparkContext}
    
    /**
      * 业务场景:求top值
      * Created by YJ on 2017/2/8.
      */
    
    object case6 {
      def main(args: Array[String]): Unit = {
        val conf = new SparkConf().setMaster("local").setAppName("reduce")
        val sc = new SparkContext(conf)
        sc.setLogLevel("ERROR")
        val six = sc.textFile("hdfs://192.168.109.130:8020//user/flume/ClassicCase/case6/*", 2)
        var idx = 0;
        val res = six.filter(x => (x.trim().length > 0) && (x.split(",").length == 4))
          .map(_.split(",")(2))
          .map(x => (x.toInt, ""))
          .sortByKey(false)    //fasle ->倒序
          .map(x => x._1).take(5)
          .foreach(x => {
            idx = idx + 1
            println(idx + "\t" + x)
          })
      }
    
    }
    
    

    结果输出:
    1 9000
    2 2000
    3 1234
    4 1000
    5 910

    rdd.map(_.split(",")).map(x=>(x(0),x(1),x(2),x(3))).sortBy(x=>x._3.toInt,false).take(3).foreach(println)

    相关文章

      网友评论

          本文标题:Spark经典案例之求top值

          本文链接:https://www.haomeiwen.com/subject/gdfyzftx.html