美文网首页
Python爬虫实战笔记_3-3

Python爬虫实战笔记_3-3

作者: Sugeei | 来源:发表于2016-07-13 19:18 被阅读38次
    find()函数的进阶用法
    • find 函数中各参数与SQL语法的对照:
      table.find( { 'key': value }, { column1: 1, column2: 1 } )
      用SQL写就是:
    select column1, column2
    from table
    where key = value
    
    源码
    • getdaterange() 接收入参分别为起,止日期,输出一个日期列表列出所有在给定起,止时间范围之间的日期,格式为‘2016-07-10’。
    • get_statistic_by_daterange() 接收data参数为从数据表查询某一类商品的结果,反回此类商品在给定日期范围内每天的发贴数量,以list格式返回。
    def getdaterange(startfrom, enddate):
        #  input: ('2016-01-01', '2016-01-07')
        #  ouput: ['2016-01-01', '2016-01-02', '2016-01-03',  '2016-01-04', '2016-01-05', '2016-01-06', '2016-01-07']
        stastamp = time.mktime(time.strptime(startfrom,"%Y-%m-%d")) 
        endstamp = time.mktime(time.strptime(enddate,"%Y-%m-%d")) 
        datelist = []
        for i in range(int((endstamp - stastamp)/3600/24)):
            datelist.append(time.strftime("%Y-%m-%d", time.localtime(stastamp + i*3600*24)))
        return datelist
        
    def get_statistic_by_daterange(data, daterange):
        #  input: data = {}, daterange = ['2016-07-02', '2016-07-03', '2016-07-04', '2016-07-05', '2016-07-06', '2016-07-07', '2016-07-08']
        #  ouput: {'2016-07-07': 15, '2016-07-04': 7, '2016-07-02': 35, '2016-07-08': 36, '2016-07-06': 13, '2016-07-05': 10, '2016-07-03': 9}
        statistic = {}
        for time in daterange:
            statistic[time] = 0
        for item in data:
            itdate = item['pubtime'][0]
            if itdate in daterange:
                statistic[itdate] += 1
        print(statistic)
        return [item[1] for item in sorted(statistic.items())]#.values()
    
    • 起始日期以当前日期为参考的7天前,
      终止日期为当前日期,
      分别获取指定的三种类别的所有贴子数据,用get_statistic_by_daterange()分别算出它们的最近7日的发贴量
    stadate = time.strftime("%Y-%m-%d", time.localtime(time.time() - 7 * 3600 * 24 ))
    enddate = time.strftime("%Y-%m-%d", time.localtime(time.time()))
    datelist = getdaterange(stadate, enddate)
    print(datelist )
    for item in ['笔记本电脑', '手机', '台式电脑整机']:
        datalist.append(get_statistic_by_daterange(tinfo.find({'category': item}), datelist))
    #print(data)
    
    运行结果
    • 指定的三类商品最近7天的发贴量统计


      Screen Shot 2016-07-09 at 3.03.24 PM.png

    相关文章

      网友评论

          本文标题:Python爬虫实战笔记_3-3

          本文链接:https://www.haomeiwen.com/subject/ljhkjttx.html