美文网首页
Python 计算文件行数

Python 计算文件行数

作者: kakarotto | 来源:发表于2020-11-05 10:59 被阅读0次

    Python 实现,读取文件行数,类似 wc -l 的功能,小文件无所谓,如果遇到大文件(5G),就要选择一种性能较高的方法了。

    列出一下方法:

    1. readline 读所有行

    def readline_count(file_name):
        return len(open(file_name).readlines())
    

    2. 依次读取文件每行内容进行计数

    def simple_count(file_name):
        lines = 0
        for _ in open(file_name):
            lines += 1
        return lines
    

    3. sum 计数

    def sum_count(file_name):
        return sum(1 for _ in open(file_name))
    

    4. enumerate 枚举计数

    def enumerate_count(file_name):
        with open(file_name) as f:
            for count, _ in enumerate(f, 1):
                pass
        return count
    

    5. buff count 每次读取固定大小,然后统计行数

    def buff_count(file_name):
        with open(file_name, 'rb') as f:
            count = 0
            buf_size = 1024 * 1024
            buf = f.read(buf_size)
            while buf:
                count += buf.count(b'\n')
                buf = f.read(buf_size)
            return count
    

    6. 调用使用 wc 命令计算行

    def wc_count(file_name):
        import subprocess
        out = subprocess.getoutput("wc -l %s" % file_name)
        return int(out.split()[0])
    

    7. 在buff_count基础上引入partial

    def partial_count(file_name):
        from functools import partial
        buffer = 1024 * 1024
        with open(file_name) as f:
            return sum(x.count('\n') for x in iter(partial(f.read, buffer), ''))
    

    8. iter count,在buff_count基础上引入itertools 模块

    def iter_count(file_name):
        from itertools import (takewhile, repeat)
        buffer = 1024 * 1024
        with open(file_name) as f:
            buf_gen = takewhile(lambda x: x, (f.read(buffer) for _ in repeat(None)))
            return sum(buf.count('\n') for buf in buf_gen)
    
    方法 100M 500M 1G 5G
    readline_count 0.25 1.82 3.27 45.04
    simple_count 0.13 0.85 1.58 13.54
    sum_count 0.15 0.77 1.59 14.07
    enumerate_count 0.15 0.8 1.6 13.37
    buff_count 0.13 0.62 1.18 10.21
    wc_count 0.09 0.53 0.99 9.47
    partial_count 0.12 0.55 1.11 8.92
    iter_count 0.08 0.42 0.83 8.33

    相关文章

      网友评论

          本文标题:Python 计算文件行数

          本文链接:https://www.haomeiwen.com/subject/ttsovktx.html