统计所有邮件的发件人所属机构的出现次数,按照邮件域名计算:
- 提取数据部分是常规的文件处理方法
file.split()
- 不同点在于把数据插入数据库,需要调用
cursor()
相关指令,注意插入的数据用占位符?
替代。 -
commit()
放在循坏外,提高速度。
import sqlite3
conn = sqlite3.connect('testdb.sqlite')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS Counts')
cur.execute('''
CREATE TABLE Counts (org TEXT, count INTEGER)''')
fname = input('Enter file name: ')
if (len(fname) < 1): fname = 'mbox-short.txt'
# fname = 'mbox.txt'
fh = open(fname)
for line in fh:
if not line.startswith('From: '): continue
pieces = line.split()
email_full = pieces[1]
org = email_full.split('@')[1] #organization name
cur.execute('SELECT count FROM Counts WHERE org = ? ', (org,))
row = cur.fetchone()
# counting times of the org ocurrence
if row is None:
cur.execute('''INSERT INTO Counts (org, count)
VALUES (?, 1)''', (org,))
else:
cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?',
(org,))
conn.commit()
# have a check about the results
sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'
for row in cur.execute(sqlstr):
print(str(row[0]), row[1])
cur.close()
2018.4.25
网友评论