美文网首页
Python 多进程简单示例

Python 多进程简单示例

作者: 京樂春水 | 来源:发表于2020-04-11 15:29 被阅读0次

处理同时在多个文件中筛选日志数据时,为了提高效率考虑使用Python多进程。对比单进程的时候,在本机(奔腾处理器)未发现有太大的提升,但是将其放入服务器运行(6核6线程)提升明显。
今天将多进程用法简单做下笔记,以便日后查询。

  1. Pool
from multiprocessing import Pool, cpu_count
import os
import time

def test(num):
   # 获取子进程的名称
    print("{} is running...".format(os.getpid()))
    print(num)
    # 休眠5秒
    time.sleep(5)

if __name__ == "__main__":
    start = time.time()
    # 打印CPU核心数量
    print("CPU counters: {}".format(cpu_count()))
    if cpu_count() > 1:
        p = Pool()
        p.apply_async(test, args=(1, ))
        p.apply_async(test, args=(2, ))
        # 关闭Pool
        p.close()
        p.join()
    print("Multi Cost: {}".format(time.time() - start ))

    start = time.time()
    test(1)
    test(2)
    print("Single Cost: {}".format(time.time() - start ))

运行结果如下:

CPU counters: 2
22168 is running...
25172 is running...
2
1
Multi Cost: 7.875996828079224

21360 is running...
1
21360 is running...
2
Single Cost: 10.007039546966553

可见,多进程执行程序还是稍微有点影响的。

  1. 进程间数据共享
from multiprocessing import Pool, Manager, Process, cpu_count
import os
import time

def test(num, l, d):
    print("{} is running...".format(os.getpid()))
    print(num)
    l.append(num * 2)
    d[num] = num * 2
    # 休眠5秒
    time.sleep(5)

if __name__ == "__main__":
    start = time.time()
    # 打印CPU核心数量
    print("CPU counters: {}".format(cpu_count()))
    if cpu_count() > 1:
        l = Manager().list()
        d = Manager().dict()
        p = Pool()
        p.apply_async(test, args=(1, l, d))
        p.apply_async(test, args=(2, l, d))
        # 关闭Pool
        p.close()
        p.join()
        print("List: {}".format(l))
        print("Dict: {}".format(d))
    print("Multi Cost: {}".format(time.time() - start ))

运行结果如下:

CPU counters: 2
13924 is running...
1
27060 is running...
2
List: [2, 4]
Dict: {1: 2, 2: 4}
Multi Cost: 11.537004232406616

使用Manager的好处是不用加锁,因为它已经默认加锁了

  1. 进程数据共享的一个“坑”
    错误的代码:
from multiprocessing import Pool, Manager, Process, cpu_count
import os
import time

def test(l):
    print("{} is running...".format(os.getpid()))
    print("inner list: {}".format(l))
    l[0][1] = 9999
    print("inner has changed")

if __name__ == "__main__":
    start = time.time()
    # 打印CPU核心数量
    print("CPU counters: {}".format(cpu_count()))
    if cpu_count() > 1:
        l = Manager().list()
        l.append({1: 2})
        p1 = Process(target=test, args=(l, ))
        p1.start()
        p1.join()
        print("Outter list: {}".format(l[0]))
    print("Multi Cost: {}".format(time.time() - start ))

运行结果如下:

CPU counters: 2
27608 is running...
inner list: [{1: 2}]
inner has changed
Outter list: {1: 2}
Multi Cost: 4.526983976364136

会发现,列表中的数据并没有被修改未‘9999’

正确的代码:

from multiprocessing import Pool, Manager, Process, cpu_count
import os
import time

def test(l):
    print("{} is running...".format(os.getpid()))
    print("inner list: {}".format(l))
    # 错误的交换变量
    temp = l[0][1]
    temp = 9999
    l[0][1] = temp
    # 正确的交换变量
    temp = l[0]
    temp[1] = 9999
    l[0] = temp
    print("inner has changed")

if __name__ == "__main__":
    start = time.time()
    # 打印CPU核心数量
    print("CPU counters: {}".format(cpu_count()))
    if cpu_count() > 1:
        l = Manager().list()
        l.append({1: 2})
        p1 = Process(target=test, args=(l, ))
        p1.start()
        p1.join()
        print("Outter list: {}".format(l))
    print("Multi Cost: {}".format(time.time() - start ))

运行结果为:

CPU counters: 2
7784 is running...
inner list: [{1: 2}]
inner has changed
Outter list: [{1: 9999}]
Multi Cost: 4.427980184555054

终于出现了预期的结果,其中还有一个错误交换变量的示范。所以更改共享数据中列表的数据,Manager无法感知,需要用交换变量的方法解决。

参考:
https://www.jianshu.com/p/52676b93430d
https://blog.csdn.net/qhd1994/article/details/79864087

相关文章

网友评论

      本文标题:Python 多进程简单示例

      本文链接:https://www.haomeiwen.com/subject/qtdqphtx.html