美文网首页脚本殿堂玩耍PythonPython 运维
通过Python脚本快速统计分析日志

通过Python脚本快速统计分析日志

作者: 科斯莫耗子 | 来源:发表于2016-07-21 20:24 被阅读5026次

    从朋友那里拿到一个需求,根据日志分析统计并发情况,统计自定义时间段的用户流量,具体需求如下:

    1. 晒选某个时间点的数据:查找某天下午1:30到2:30这个时间段或者自己定义,看看到底有多少
    2. 同时筛选出两个项目标号的文档,统计和去重统计17/18
    3. babyhealth:[2016-05-03 19:21:23] INFO orderinfo:472 - v41/nbCode/getauth||1508031:17

    上面第三行的内容就是日志的格式,简单分析了一下需求,shell脚本、Excel 分列筛选透视、Python都 可以解决,考虑到日志数量较大,通过 Excel 效率可能偏低,因此这里采用 Python 进行处理。

    处理的主要思路是:

    • 对日志文件内容进行分解。每条日志的关键信息有日期(2016-05-03)、时间(19:21:23)、值(1508031)、块(17),应用正则对日志进行逐行的匹配,提取关键信息,放到数据库中
    • 在数据库中通过SQL语句进行统计

    对于第一部分,代码如下:

    #! /usr/bin/env python
    # -*- coding: utf-8 -*-
    import re
    import sqlite3
    #说明
    print u''
    print u'-----------------使用说明-----------------'
    print u'将脚本放入日志所在目录,运行脚本,输入日志\n文件名(含扩展名),将在相同目录生成log.db\n数据库'
    print u'------------------------------------------'
    print u''
    #读取日志文件、连接数据库
    name = raw_input('Please input the log name:')
    log_db = sqlite3.connect('./log.db')
    cursor = log_db.cursor()
    #建表
    cursor.execute('DROP TABLE IF EXISTS log')
    cursor.execute('CREATE TABLE log (front, log_date, log_time, log_path, value, level)')
    #设定正则规则
    data_re = re.compile(r'正则规则不放了,会和代码编辑器冲突',re.S|re.M)
    #读取日志文件
    log = open('./%s' %name)
    index = 0
    #进行转换
    try:
        for line in log:
            found = data_re.findall(line)
            if found != None and len(found) != 0:
                index = index + 1
                cursor.execute('INSERT INTO log (front, log_date, log_time, log_path, value, level) VALUES (?,?,?,?,?,?)', (found[0][0], found[0][1], found[0][2], found[0][3], found[0][4], found[0][5]))
                if index == 100:
                    log_db.commit()
                    #print '100 records has been submitted.'
        log_db.commit()
        print 'job done!'
        input()
    except:
        pass
    

    以上程序在 Linux 和 Windows 下都可以执行,生成 log.db 数据库, Linux 下可以直接使用 Sqlite3 对数据库进行操作, Windows 下可以安装 SQLite Expert 进行操作和查询,查询语句如下:

    SELECT COUNT(*) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00';
    SELECT COUNT(*) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '17';
    SELECT COUNT(*) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '18';
    SELECT COUNT(DISTINCT value)FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00';
    SELECT COUNT(DISTINCT value) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '17';
    SELECT COUNT(DISTINCT value) FROM log WHERE log_date = '2016-05-03' and log_time between '13:30:00' and '14:30:00' and level = '18';
    

    为了提高方便程度,可以用 Python 操作 SQLite3 进行查询:

    #! /usr/bin/env python
    # -*- coding: utf-8 -*-
    import sqlite3
    
    date = raw_input('Please input the date(eg.2016-05-03):')
    start_time = raw_input('Please input the start time(eg.08:30:00):')
    end_time = raw_input('Please input the end time(eg.09:30:00):')
    stat_db = sqlite3.connect('./log.db')
    cursor = stat_db.cursor()
    cursor.execute('SELECT COUNT(*) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\'' %(date,start_time,end_time))
    amount_all = cursor.fetchall()[0][0]
    cursor.execute('SELECT COUNT(DISTINCT value) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\'' %(date,start_time,end_time))
    amount_na = cursor.fetchall()[0][0]
    cursor.execute('SELECT COUNT(*)  FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'17\'' %(date,start_time,end_time))
    month_all = cursor.fetchall()[0][0]
    cursor.execute('SELECT COUNT(DISTINCT value) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'17\'' %(date,start_time,end_time))
    month_na = cursor.fetchall()[0][0]
    cursor.execute('SELECT COUNT(*) FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'18\'' %(date,start_time,end_time))
    year_all = cursor.fetchall()[0][0]
    cursor.execute('SELECT COUNT(DISTINCT value)  FROM log WHERE log_date = \'%s\' and log_time between \'%s\' and \'%s\' and level = \'18\'' %(date,start_time,end_time))
    year_na = cursor.fetchall()[0][0]
    print(u'')
    print(u'------------统计------------')
    print(u'%s %s~%s\n----------------------------\n总点击次数/人:%s|%s\n年用户:%s|%s\n月用户:%s|%s' % (date, start_time, end_time, amount_all, amount_na, year_all, year_na, month_all, month_na))
    print(u'')
    input('Press any key to exit...')
    

    以上程序均基于 Python 2.7+ 环境。

    相关文章

      网友评论

        本文标题:通过Python脚本快速统计分析日志

        本文链接:https://www.haomeiwen.com/subject/wqnbjttx.html