1 面试题
"""
编写Python脚本,分析site.log文件,按域名统计访问次数
site.log文件内容如下:
https://www.sogo.com/ale.html
https://www.qq.com/3asd.html
https://www.sogo.com/teoans.html
https://www.bilibili.com/2
https://www.sogo.com/asd_sa.html
https://y.qq.com/
https://www.bilibili.com/1
https://dig.chouti.com/
https://www.bilibili.com/imd.html
https://www.bilibili.com/
脚本输出:
4 www.bilibili.com
3 www.sogo.com
1 www.qq.com
1 y.qq.com
1 dig.chouti.com
代码块:第一种方法:
import re
from collections import Counter # 用来计数
with open("site.log") as f:
data = f.read()
ym_list = re.findall(r'https://(.*?)/.*?',data)
ym_dict = dict(Counter(ym_list))
# # 排序
ret = sorted(ym_dict.items(), key=lambda x: x[1], reverse=True)
print(ret) # [('www.bilibili.com', 4), ('www.sogo.com', 3), ('www.qq.com', 1), ('y.qq.com', 1), ('dig.chouti.com', 1)]
for i in ret:
print(i[1], i[0])
代码块:第二种方法:
import re
dic = {}
with open('site.log') as f:
for i in f:
r = re.search('\w+.(\w+).com', i).group()
if r in dic:
dic[r].append(r)
else:
dic[r] = [r]
res={}
for k,v in dic.items():
res[k]=len(v)
# 排序
fin_res=sorted(res.items(),key=lambda x:x[1],reverse=True)
for k,v in fin_res:
print(v,k)
网友评论