美文网首页
Spark-Streaming: 分析tomcat的日志

Spark-Streaming: 分析tomcat的日志

作者: yonggang_sun | 来源:发表于2016-05-18 21:14 被阅读441次

    Spark-Streaming: 分析tomcat的日志

    要求统计TOP 100的 IP

    1. 通过spark streaming得到(ip, ip_count),按照ip_count倒序100
    2. 程序:
    package io.github.sparkstream
    
    import java.io.FileInputStream
    
    import org.apache.spark.SparkConf
    import org.apache.spark.streaming._
    import org.apache.spark.streaming.StreamingContext
    
    /**
     * Created by sunyonggang on 16/5/10.
     */
    class TomcatLog {
    
    }
    
    object TomcatLog {
      def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("wordcount")
        val isDebug = true
        val duration = 5
        if (isDebug) {
          conf.setMaster("local")
        }
        val ssc = new StreamingContext(conf, Seconds(duration))
        // textFileStream means to Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files
        val lines = ssc.textFileStream("/Users/sunyonggang/Downloads/softwareforspark/tenten")
        val ips = lines.map(line => (line.split(" ")(0), 1)).reduceByKey(_ + _)
        // get all the (ip, ip_count)
        ips.saveAsTextFiles("/Users/sunyonggang/Downloads/spark-1.5.2/result")
        ssc.start()
        ssc.awaitTermination()
      }
    }
    得到多个文件夹:
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2]$ du -sh result*
      0B    result
    8.0K    result-1462926380000
     20K    result-1462926385000
    8.0K    result-1462926390000
    8.0K    result-1462926395000
    查看result-1462926385000中的输出:
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2/result-1462926385000]$ head part-00000
    (220.181.108.157,1)
    (207.46.13.95,2)
    (1.59.65.67,2)
    (192.250.46.129,87)
    (66.249.71.137,14)
    (117.136.30.147,3)
    (72.14.202.87,3)
    (117.136.11.190,1)
    (159.226.202.13,2)
    (183.9.112.2,25)
    排序后,输出top100( 显示部分):
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2/result-1462926385000]$ cat part-00000 | tr -d '()' | sort -t ',' -k2nr,2 | head -100
    218.20.24.203,4597
    221.194.180.166,4576
    119.146.220.12,1850
    117.136.31.144,1647
    121.28.95.48,1597
    113.109.183.126,1596
    182.48.112.2,870
    120.84.24.200,773
    61.144.125.162,750
    27.115.124.75,470
    115.236.48.226,439
    59.41.62.100,339
    89.126.54.40,305
    114.247.10.132,243
    125.46.45.78,236
    220.181.94.221,205
    218.19.42.168,181
    118.112.183.164,179
    116.235.194.89,171
    114.43.237.117,167
    61.155.206.81,165
    202.108.18.253,164
    218.107.55.254,164
    14.213.176.184,133
    121.14.162.28,125
    123.150.182.147,125
    121.14.162.124,124
    123.150.182.180,124
    

    统计Top 50 页面PV

    1. 与第一个问题类似,现在找的是dst的url
    2. 输出:
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2]$ du -sh result*
      0B    result
    172K    result-1462935920000
    8.0K    result-1462935925000
    8.0K    result-1462935930000
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2]$ cd result-1462935920000
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2/result-1462935920000]$ ls
    _SUCCESS    part-00000
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2/result-1462935920000]$ head part-00000
    (/home.php?mod=misc&ac=sendmail&rand=1327969460,1)
    (/static/js/smilies.js?AZH,194)
    (/home.php?mod=misc&ac=sendmail&rand=1328006543,1)
    (/space-username-Dafuyang.html?ajaxmenu=1&inajax=1&ajaxtarget=aaiqJdWSScgksYAgcXYJYOLWYaWQOaNJ_menu_content,1)
    (/forum.php?mod=ajax&action=forumchecknew&fid=53&time=1328023418&inajax=yes,3)
    (/home.php?mod=space&do=pm,1)
    (/group.php?sgid=25,1)
    (/home.php?mod=space&uid=35,1)
    (/static/image/smiley/qq/tsh.gif,2)
    (/forum.php?mod=ajax&action=forumchecknew&fid=46&time=1328005619&inajax=yes,10)
    
    输出:
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2/result-1462935920000]$ cat part-00000 | tr -d '()' | sort -t ',' -k2nr,2 | head -50
    /static/js/floating-jf.js,1329
    /static/js/jquery-1.6.js,1263
    /data/cache/style_2_common.css?AZH,657
    /data/cache/style_2_widthauto.css?AZH,615
    /static/js/common.js?AZH,570
    /static/js/forum.js?AZH,495
    /forum-58-1.html,462
    /popwin_js.php?fid=58,387
    /static/image/common/arrwd.gif,373
    /static/image/common/scrolltop.png,345
    /data/cache/style_2_forum_forumdisplay.css?AZH,332
    /ads/banner-01.gif,308
    /static/image/common/logo.png,308
    /static/image/common/nv_a.png,296
    /static/image/common/qmenu.png,291
    /popwin_js.php?fid=,281
    /static/image/common/house.gif,279
    /static/image/common/pt_item.png,275
    /static/js/seditor.js?AZH,261
    /static/image/common/pn_post.png,239
    /static/image/common/fav.gif,234
    /static/image/common/arw_l.gif,233
    /popwin_js.php?fid=53,230
    /static/js/share_icon.js,230
    /data/cache/common_smilies_var.js?AZH,226
    /static/image/common/user_online.gif,219
    /forum.php,218
    /static/image/common/folder_common.gif,211
    /static/image/common/pin_3.gif,209
    /static/image/common/feed.gif,208
    /static/image/filetype/image_s.gif,207
    /static/image/common/px.png,203
    /static/image/common/atarget.png,202
    /static/image/common/refresh.png,202
    /static/js/smilies.js?AZH,194
    /,181
    /static/image/common/tip_bottom.png,178
    /static/image/editor/editor.gif,174
    /static/image/filetype/common.gif,170
    /static/image/common/login.gif,162
    /data/cache/style_2_forum_index.css?AZH,157
    /popwin_js.php?fid=0,154
    /static/image/common/notice.gif,127
    /data/cache/style_2_forum_viewthread.css?AZH,113
    /static/image/common/arw_r.gif,112
    /static/js/forum_viewthread.js?AZH,109
    /popwin_js.php?fid=46,101
    /static/image/common/collapsed_no.gif,101
    /static/image/common/forum.gif,101
    /static/image/common/16x16.gif,92
    

    统计浏览器的类型和版本

    1. 与上面类似
    2. 具体
    [sunyonggang@sunyongangdeMBP ~/Downloads/spark-1.5.2/result-1462937565000]$ cat part-00000 | tr -d '"()' | sort -t ',' -k2nr,2
    Mozilla/5.0,14519
    Mozilla/4.0,10191
    MQQBrowser/2.9/ZTE-TU880_TD/1.0,1724
    MQQBrowser/2.9/Adr,526
    -,252
    Sogou,205
    JUCLinux;U;2.2.2;Zh_cn;TCL,165
    JUC,125
    JUCLinux;,124
    ZTE-TU880_TD/1.0,93
    Opera/9.80,90
    DoCoMo/2.0,28
    Sosospider++http://help.soso.com/webspider.htm,20
    Huaweisymantecspider,13
    ia_archiver,13
    AdsBot-Google-Mobile,9
    HuaweiSymantecSpider/1.0+DSE-support@huaweisymantec.com+compatible;,5
    Shockwave,5
    Dalvik/1.4.0,4
    libwww-perl/5.834,4
    Mozilla/0.6,2
    Mozilla/4.0compatible;,2
    Mozilla/4.7,2
    Yahoo!,2
    curl/7.15.5,2
    myCrawl/Nutch-1.3,2
    360se,1
    AdsBot-Google,1
    Baiduspider++http://www.baidu.com/search/spider.htm,1
    MOT-MT620_TD/1.0,1
    TencentTraveler,1
    milodns,1
    mozilla/4.0,1
    

    相关文章

      网友评论

          本文标题:Spark-Streaming: 分析tomcat的日志

          本文链接:https://www.haomeiwen.com/subject/cciwrttx.html