Issue
I have set up the following test program (Python 3.9.5, numpy 1.20.2):
import asyncio
from datetime import datetime
import numpy as np
async def calculate():
print("=== unique")
await asyncio.to_thread(lambda: np.unique(np.ones((2000, 50000)), axis=0))
print("=== sort")
await asyncio.to_thread(lambda: np.sort(np.ones((2000, 50000)), axis=0))
print("=== cumsum")
await asyncio.to_thread(lambda: np.cumsum(np.ones((2000, 100000)), axis=0))
async def ping():
while True:
print("async", datetime.utcnow())
await asyncio.sleep(0.2)
async def main():
p1 = asyncio.create_task(ping())
c = asyncio.create_task(calculate())
await asyncio.wait([p1, c], return_when=asyncio.FIRST_COMPLETED)
p1.cancel()
asyncio.run(main())
The output is as follows:
async 2021-05-21 13:20:16.308948
=== unique
async 2021-05-21 13:20:16.531135
async 2021-05-21 13:20:40.142323
=== sort
async 2021-05-21 13:20:40.343306
async 2021-05-21 13:20:40.543658
async 2021-05-21 13:20:40.743989
async 2021-05-21 13:20:40.944312
async 2021-05-21 13:20:41.144664
async 2021-05-21 13:20:41.345007
=== cumsum
async 2021-05-21 13:20:41.545523
async 2021-05-21 13:20:41.745901
async 2021-05-21 13:20:41.946271
async 2021-05-21 13:20:42.146651
async 2021-05-21 13:20:42.347021
async 2021-05-21 13:20:42.547396
It is evident that np.unique
takes ~23 seconds, and does not ever get interrupted the way it happens with np.cumsum
and np.sort
.
If my understanding of asyncio.to_thread
and GIL is correct, anything that runs in a thread should be periodically interrupted to enable at least some degree of multitasking with threaded programs. This is supported by the behavior of np.sort
and np.cumsum
. What happens in np.unique
that prevents that thread from being interrupted?
Solution
this was a tricky one ;-)
The problem is that the GIL is not actually released in the np.unique
call. The reason is the axis=0
parameter (you can verify that without it the call to np.unique
releases GIL and is interleaved with the ping
call).
TLDR; The semantics of axis
argument is different for np.sort/cumsum
and np.unique
calls: while for np.sort/cumsum
the operation is performed vectorized "in" that axis (i.e., sorting several arrays independently), the np.unique
is performed on slices "along" that axis, and these slices are non-trivial data types, hence they require Python methods.
With the axis=0
, what numpy does is that it "slices" the array in the first axis, creating a ndarray
with shape (2000, 1)
, each element being an "n-tuple of values" (its dtype is an array of dtypes of the individual elements); this happens at https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/lib/arraysetops.py#L282-L294 .
Then a ndarray.sort
method is called at https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/lib/arraysetops.py#L333. That in the end calls https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/core/src/multiarray/item_selection.c#L1236, which tries to release GIL at line https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/core/src/multiarray/item_selection.c#L979 , whose definition is at https://github.com/numpy/numpy/blob/7de0fa959e476900725d8a654775e0a38745de08/numpy/core/include/numpy/ndarraytypes.h#L1004-L1006 -- so the GIL is released only if the type does not state NPY_NEEDS_PYAPI
. However, given that the individual array elements are at this point nontrivial types, I assume they state NPY_NEEDS_PYAPI
(I would expect for example comparisons to go through Python), and the GIL is not released.
Cheers.
Answered By - Milan Straka
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.