美文网首页
python并行利用多核处理学习-压缩文件

python并行利用多核处理学习-压缩文件

作者: 小黑佬 | 来源:发表于2020-04-14 17:55 被阅读0次

    GNU并行示例:


    要递归压缩或解压缩目录:

    find . -type f | parallel gzip
    find . -type f | parallel gzip -d
    要将当前目录压缩到.gz.tar中,请执行以下操作:

    parellel ::: gzip && cd .. && tar -cvf archive.tar dir/to/compress

    使用Python编写GNU Parallel脚本:


    以下python脚本构建bash命令,以递归方式压缩或解压缩给定路径。

    要将目录中的所有文件压缩到以该文件夹命名的tar中:
    ./gztar.py -c /dir/to/compress

    要将所有文件从tar解压缩到以tar命名的文件夹中:
    ./gztar.py -d /tar/to/decompress

    #! /usr/bin/python
    # This script builds bash commands that compress files in parallel
    
    def compress(dir):
        os.system('find ' + dir + ' -type f | parallel gzip -q && tar -cf '
                  + os.path.basename(dir) + '.tar -C ' + dir + ' .')
    
    def decompress(tar):
        d = os.path.splitext(tar)[0]
        os.system('mkdir ' + d + ' && tar -xf ' + tar + ' -C ' + d +
              ' && find ' + d + ' -name *.gz -type f | parallel gzip -qd')
    
    p = argparse.ArgumentParser()
    p.add_argument('-c', '--compress', metavar='/DIR/TO/COMPRESS', nargs=1)
    p.add_argument('-d', '--decompress', metavar='/TAR/TO/DECOMPRESS.tar', nargs=1)
    args = p.parse_args()
    
    if args.compress:
        compress(str(args.compress)[2:-2])
    if args.decompress:
        decompress(str(args.decompress)[2:-2])
    

    使用纯Python的多线程压缩:


    如果由于某种原因您不想将gnu并行用于队列命令,我编写了一个小脚本,该脚本仅使用python(无bash调用)进行多线程压缩。由于python GIL因瓶颈而臭名昭著,因此在调用时要格外小心multiprocessing()。此实现还具有以下优点:CPU节流标志,压缩/解压缩后删除标志以及压缩过程中的进度条。

    1. 首先,检查并确保您具有所有必需的pip模块:
      pip install tqdm
    2. 第二个链接gztar.py文件到/ usr / bin:
      sudo ln -s /path/to/gztar.py /usr/bin/gztar
    3. 现在,使用新的gztar命令压缩或解压缩目录:
      gztar -c /dir/to/compress -r -t
    #! /usr/bin/python
    ## A pure python implementation of parallel gzip compression using multiprocessing
    import os, gzip, tarfile, shutil, argparse, tqdm
    import multiprocessing as mp
    
    #######################
    ### Base Functions
    ###################
    def search_fs(path):
        file_list = [os.path.join(dp, f) for dp, dn, fn in os.walk(os.path.expanduser(path)) for f in fn]
        return file_list
    
    def gzip_compress_file(path):
        with open(path, 'rb') as f:
            with gzip.open(path + '.gz', 'wb') as gz:
                shutil.copyfileobj(f, gz)
        os.remove(path)
    
    def gzip_decompress_file(path):
        with gzip.open(path, 'rb') as gz:
            with open(path[:-3], 'wb') as f:
                shutil.copyfileobj(gz, f)
        os.remove(path)
    
    def tar_dir(path):
        with tarfile.open(path + '.tar', 'w') as tar:
            for f in search_fs(path):
                tar.add(f, f[len(path):])
    
    def untar_dir(path):
        with tarfile.open(path, 'r:') as tar:
            tar.extractall(path[:-4])
    
    #######################
    ### Core gztar commands
    ###################
    def gztar_c(dir, queue_depth, rmbool):
        files = search_fs(dir)
        with mp.Pool(queue_depth) as pool:
            r = list(tqdm.tqdm(pool.imap(gzip_compress_file, files),
                               total=len(files), desc='Compressing Files'))
        print('Adding Compressed Files to TAR....')
        tar_dir(dir)
        if rmbool == True:
            shutil.rmtree(dir)
    
    def gztar_d(tar, queue_depth, rmbool):
        print('Extracting Files From TAR....')
        untar_dir(tar)
        if rmbool == True:
            os.remove(tar)
        files = search_fs(tar[:-4])
        with mp.Pool(queue_depth) as pool:
            r = list(tqdm.tqdm(pool.imap(gzip_decompress_file, files),
                               total=len(files), desc='Decompressing Files'))
    
    #######################
    ### Parse Args
    ###################
    p = argparse.ArgumentParser('A pure python implementation of parallel gzip compression archives.')
    p.add_argument('-c', '--compress', metavar='/DIR/TO/COMPRESS', nargs=1, help='Recursively gzip files in a dir then place in tar.')
    p.add_argument('-d', '--decompress', metavar='/TAR/TO/DECOMPRESS.tar', nargs=1, help='Untar archive then recursively decompress gzip\'ed files')
    p.add_argument('-t', '--throttle', action='store_true', help='Throttle compression to only 75%% of the available cores.')
    p.add_argument('-r', '--remove', action='store_true', help='Remove TAR/Folder after process.')
    arg = p.parse_args()
    ### Flags
    if arg.throttle == True:
        qd = round(mp.cpu_count()*.75)
    else:
        qd = mp.cpu_count()
    ### Main Args
    if arg.compress:
        gztar_c(str(arg.compress)[2:-2], qd, arg.remove)
    if arg.decompress:
        gztar_d(str(arg.decompress)[2:-2], qd, arg.remove)
    

    文章来源:
    tar.gz vs zip

    相关文章

      网友评论

          本文标题:python并行利用多核处理学习-压缩文件

          本文链接:https://www.haomeiwen.com/subject/ytntvhtx.html