Issue
I need to create many large numpy arrays (4e6, 100) with random numbers from a standard normal distribution, which I'm trying to speed up. I tried to generate different parts of the arrays using multiple cores but I'm not getting the expected speed improvements. Is there something I'm doing wrong, or am I wrong to expect speed improvements in this way?
from numpy.random import default_rng
from multiprocessing import Pool
from time import time
def rng_mp(rng):
return rng.standard_normal((250000, 100))
if __name__ == '__main__':
n_proc = 4
rngs = [default_rng(n) for n in range(n_proc)]
rng_all = default_rng(1)
start = time()
result = rng_all.standard_normal((int(1e6), 100))
print(f'Single process: {time() - start:.3f} seconds')
start = time()
with Pool(processes=n_proc) as p:
result = p.map_async(rng_mp, rngs).get()
print(f'MP: {time() - start:.3f} seconds')
# Single process: 1.114 seconds
# MP: 2.634 seconds
Solution
I suspected the slowdown results simply from the fact that you need to be moving lots of data from the address spaces of the subprocesses back to the main process. I also suspected that the C-language implementation numpy
used for random number generation releases the Global Interpreter Lock and that using multithreading instead of multiprocessing would solve your performance problem:
from numpy.random import default_rng
from multiprocessing.pool import ThreadPool
from time import time
def rng_mp(rng):
return rng.standard_normal((250000, 100))
if __name__ == '__main__':
n_proc = 4
rngs = [default_rng(n) for n in range(n_proc)]
rng_all = default_rng(1)
start = time()
result = rng_all.standard_normal((int(1e6), 100))
print(f'Single process: {time() - start:.3f} seconds')
start = time()
with ThreadPool(processes=n_proc) as p:
result = p.map_async(rng_mp, rngs).get()
print(f'MT: {time() - start:.3f} seconds')
Prints:
Single process: 1.210 seconds
MT: 0.413 seconds
Answered By - Booboo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.