美文网首页Numpy
numpy如何并行计算

numpy如何并行计算

作者: ThomasYoungK | 来源:发表于2018-07-30 23:17 被阅读381次

    python因为有GIL锁,因此多线程也只能使用一个处理器,但是numpy是例外:
    http://scipy-cookbook.readthedocs.io/items/ParallelProgramming.html 这篇文字讲了numpy的并行计算,我把自己的理解总结如下:

    numpy本身的矩阵运算(array operations)可以绕过GIL

    因为numpy内部是用C写的,不经过python解释器,因此它本身的矩阵运算(array operations)都可以使用多核,此外它内部还用了BLAS(the Basic Linear Algebra Subroutines),因此可以进一步优化计算速度。

    多线程(Threads),numpy的矩阵运算和IO一样,都会释放GIL

    据我理解即使释放解释器,numpy因为不依赖解释器,所以仍然在运行;而其他线程这个时候也可以使用解释器,如果其他线程也有numpy的代码,那么该numpy也可以同样释放解释器。

    while a thread is waiting** for IO **(for you to type something, say, or for something to come in the network) python releases the GIL so other threads can run. And, more importantly for us, while numpy is doing an array operation, python also releases the GIL. Thus if you tell one thread to do, (A和B都是numpy矩阵):

    >>> A = B + C
    >>> print A
    

    During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C, another thread can run - and if you've written your code in a numpy style, much of the calculation will be done in a few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads.

    多进程(Processes)自然更加能解决并行问题

    多进程间numpy arrays也可共享,具体怎么共享再说

    It is possible to share memory between processes, including numpy arrays

    最后这个例子特别好:

    Comparison

    Here is a very basic comparison which illustrates the effect of the GIL (on a dual core machine).

    import numpy as np
    import math
    def f(x):
        print x
        y = [1]*10000000
        [math.exp(i) for i in y]
    def g(x):
        print x
        y = np.ones(10000000)
        np.exp(y)
    
    from handythread import foreach
    from processing import Pool
    from timings import f,g
    def fornorm(f,l):
        for i in l:
            f(i)
    time fornorm(g,range(100))
    time fornorm(f,range(10))
    time foreach(g,range(100),threads=2)
    time foreach(f,range(10),threads=2)
    p = Pool(2)
    time p.map(g,range(100))
    time p.map(f,range(10))
    
    
    100 * g() 10 * f()
    normal 43.5s 48s
    2 threads 31s 71.5s
    2 processes 27s 31.23

    For function f(), which does not release the GIL, threading actually performs worse than serial code, presumably due to the overhead of context switching. However, using 2 processes does provide a significant speedup. For function g() which uses numpy and releases the GIL, both threads and processes provide a significant speed up, although multiprocesses is slightly faster.

    我自己用代码仿照写了一个例子,可以直接运行(python3.6):https://gist.github.com/miniyk2012/4a2edf98493d91c60af06232b6c69582

    相关文章

      网友评论

        本文标题:numpy如何并行计算

        本文链接:https://www.haomeiwen.com/subject/zkiqvftx.html