Pythonでmultiprocessing.Poolを使ってお手軽並列処理

2コア4スレッドCPUのマシンでPythonを使ってるとCPU使用率が25%までしかならなくて、これが100%まで使えればもっと高速化できるのではないかと思っていた。

CPU使用率100%のためにはmultiprocessingモジュールを使って並列処理すれば良い。

並列処理といえばプロセス間で共有するリソースの排他制御とかを考えなきゃいけなかったり面倒な印象がある。もちろんmultiprocessingモジュールには自分でプロセスを立ち上げて共有メモリを管理する機能もあるけど、それはお手軽じゃないし設計上も好ましくないのでここでは説明しない。

ここではもっと簡単な場合を考えて、他のプロセスとは独立に動作する関数に対してPoolを使ってみる。Pool()の引数は使うCPUの数で、引数なしだと自動的に最大数に設定してくれる（僕の環境だと4）。いくつかメソッドがあるけど、組み込み関数のmap()とほぼ同じように使える機能が特に便利。

例１

x*xを返すのに1秒かかるslowf(x)に対して、slowf(0)+slowf(1)+...+slowf(9)を求める。

ソースコード

from multiprocessing import Pool, cpu_count, current_process
import time

def slowf(x):
    print(current_process().name,
        ': This started at %s.' % time.ctime().split()[3])
    
    time.sleep(1)
    return x*x

if __name__ == '__main__':
    print('cpu : %d' % 1)
    st = time.time()
    print('answer : %d' % sum(map(slowf, range(10))))
    print('time : %.3f s' % (time.time()-st))
    

    print('\ncpu : %d' % cpu_count())
    st = time.time()
    p = Pool()
    print('answer : %d' % sum(p.map(slowf, range(10))))
    print('time : %.3f s' % (time.time()-st))

Windowsの場合、Pool.mapで使う関数はmainの外で定義してmainの中で呼ぶ必要がある。
※　(Windows 環境で) if __name__ == '__main__' という記述が必要な理由については、
17.2. multiprocessing — プロセスベースの並列処理 — Python 3.3.3 ドキュメント
を参照してください。

結果

並列処理なしだと各slowf()が一秒かかるのを10回呼び出すので当然10秒かかる。
一方、並列処理すると複数のプロセスで同時に処理され、三倍くらい速くなっている。
環境：Python 3.4.0, Windows8, Core i7 4650U

cpu : 1
MainProcess : This started at 13:01:27.
MainProcess : This started at 13:01:28.
MainProcess : This started at 13:01:29.
MainProcess : This started at 13:01:30.
MainProcess : This started at 13:01:31.
MainProcess : This started at 13:01:32.
MainProcess : This started at 13:01:33.
MainProcess : This started at 13:01:34.
MainProcess : This started at 13:01:35.
MainProcess : This started at 13:01:36.
answer : 285
time : 10.006 s

cpu : 4
SpawnPoolWorker-3 : This started at 13:01:37.
SpawnPoolWorker-2 : This started at 13:01:37.
SpawnPoolWorker-4 : This started at 13:01:37.
SpawnPoolWorker-1 : This started at 13:01:37.
SpawnPoolWorker-3 : This started at 13:01:38.
SpawnPoolWorker-2 : This started at 13:01:38.
SpawnPoolWorker-1 : This started at 13:01:38.
SpawnPoolWorker-4 : This started at 13:01:38.
SpawnPoolWorker-3 : This started at 13:01:39.
SpawnPoolWorker-2 : This started at 13:01:39.
answer : 285
time : 3.221 s

例２

$\begin{eqnarray} \sum^n_{a=1} \sum^n_{b=a} \sum^n_{c=b} \mbox{gcd}(a, b, c) \end{eqnarray}$
を求める。（n=1000のとき）

計算上の工夫はいろいろできるだろうけど、ここではナイーブな実装で並列処理の効果を見る。

ソースコード

from multiprocessing import Pool, cpu_count
import time

def gcd(a, b):
    if b == 0:
        return a
    else:
        return gcd(b, a % b)
        
def f(an):
    a, n = an
    t = 0
    for b in range(a, n+1):
        for c in range(b, n+1):
            t += gcd(gcd(a,b), c)
    return t

if __name__ == '__main__':
    n = 1000

    print('cpu : %d' % 1)
    st = time.time()    
    ans = 0
    for a in range(1, n+1):
        for b in range(a, n+1):
            for c in range(b, n+1):
                ans += gcd(gcd(a, b), c)
    print('answer : %d' % ans)
    print('time : %.3f s' % (time.time()-st))
    
    
    print('\ncpu : %d' % cpu_count())
    st = time.time()
    p = Pool()
    ans = sum(p.map(f, [(a,n) for a in range(1, n+1)]))
    print('answer : %d' % ans)
    print('time : %.3f s' % (time.time()-st))

結果

PythonとPyPyの両方で1.5倍くらいの速度になった。
環境：Python 3.4.0, Windows8, Core i7 4650U

cpu : 1
answer : 229638478
time : 363.790 s

cpu : 4
answer : 229638478
time : 237.350 s

環境：PyPy 2.3.1, Windows8, Core i7 4650U

cpu : 1
answer : 229638478
time : 17.393 s

cpu : 4
answer : 229638478
time : 11.338 s

pypy速い！

まとめ

確かにCPU使用率100%になったけど特に例２では思ったよりは高速化されなかった。
複数のプロセスの生成・管理やデータのやりとりにオーバーヘッドがあるので設計によっては逆に遅くなることもある。
簡単に使えるのでゴリ押しコードを少しでも速くしたいときに試してみても良いかも。
こうなってくると6コア12スレッドのマシンとかで実行したくなる…

matsulibの日記

Ingredients as Code

Pythonでmultiprocessing.Poolを使ってお手軽並列処理

例１

ソースコード

結果

例２

ソースコード

結果

まとめ