美文网首页
Python 提取邮件头基本信息

Python 提取邮件头基本信息

作者: Tim_Lee | 来源:发表于2017-07-17 16:17 被阅读0次

    1 邮件内容

    假设目前邮件名叫“1.txt”,邮件内容为:

    From:   Justin-Bieber@entertain.org on behalf of Bieber
    Leader [leader@hello.org]
    Sent:   2017-07-01 12:48
    To: 'staff@hello.org'; custom@hello.org;
    Willim Johnson; John Snow
    Subject:    The battlefield in Winterfell
    
    
    I have just met then. More details as soon as possible. So far, so good.
    
    Sent via iPhone 7 plus
    

    2 提取思路

    • 要求把邮件头部信息提取出来,需要提取信息:
      • 发件人(From:)、发件时间(Sent)、收件人(To)、主题(Subject)
    • 初步提取信息所在行的内容即可。
    • 使用一个提取函数,把四个关键词放入数组中,用正则提取。
    • 四个信息都做了全局函数,如果曾经匹配过,则全局函数 + 1,以做标识。
    • 如果一个信息已经匹配过,而下一个信息还没匹配到,这一行的内容也需要读取出来。
    • 提取函数的返回值,如果是 None 则不做处理。
    # coding: utf-8
    import re
    
    from_count = 0
    sent_count = 0
    to_count = 0
    subject_count = 0
    
    
    def inspect_string(string):
        global from_count
        global sent_count
        global to_count
        global subject_count
    
        keyword_list = ['From:', 'Sent:', 'To:', 'Subject:']
        for keyword in keyword_list:
            regex_str = ".*({0}.*)".format(keyword)
            match_obj = re.match(regex_str, string)
    
            if re.match(".*(From:.*)", string):
                from_count += 1
    
            if re.match(".*(Sent:.*)", string):
                sent_count += 1
    
            if re.match(".*(To:.*)", string):
                to_count += 1
    
            if re.match(".*(Subject:.*)", string):
                subject_count += 1
    
            if match_obj:
                return match_obj.group(1)
    
            if from_count > 0 and sent_count < 1:
                return string
    
            if sent_count > 0 and to_count < 1:
                return string
    
            if to_count > 0 and subject_count < 1:
                return string
    
    
    with open('1.txt', 'rb') as f:
        for line in f:
            result = inspect_string(str(line))
            if result is None:
                continue
            print(result)
    

    3 运行结果

    From:   Justin-Bieber@entertain.org on behalf of Bieber
    Leader [leader@hello.org]
    
    Sent:   2017-07-01 12:48
    
    To: 'staff@hello.org'; custom@hello.org;
    
    Willim Johnson; John Snow
    
    Subject:    The battlefield in Winterfell
    

    相关文章

      网友评论

          本文标题:Python 提取邮件头基本信息

          本文链接:https://www.haomeiwen.com/subject/htfqkxtx.html