Issue
I have a large 3D NumPy array:
x = np.random.rand(1_000_000_000).reshape(500, 1000, 2000)
And for each of the 500 2D arrays, I need to keep only the largest 800 elements within each column of each 2D array. To avoid costly sorting, I decided to use np.argpartition
:
k = 800
idx = np.argpartition(x, -k, axis=1)[:, -k:]
result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
While np.argpartition
is reasonably fast, using idx
to index back into x
is really slow. Is there a faster (and memory efficient) way to perform this indexing?
Note that the results do not need to be in ascending sorted order. They just need to be the top 800
Solution
cutting the size by 10 to fit my memory, here are times for the various steps:
Creationg:
In [65]: timeit x = np.random.rand(1_000_000_00).reshape(500, 1000, 200)
1.89 s ± 82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [66]: x = np.random.rand(1_000_000_00).reshape(500, 1000, 200)
In [67]: k=800
sort:
In [68]: idx = np.argpartition(x, -k, axis=1)[:, -k:]
In [69]: timeit idx = np.argpartition(x, -k, axis=1)[:, -k:]
2.52 s ± 292 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
the indexing:
In [70]: result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
In [71]: timeit result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
The slowest run took 4.11 times longer than the fastest. This could mean that an intermediate result is being cached.
2.6 s ± 1.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
All three steps take about the same time. I don't see anything unusual about the last indexing. This .8 GB.
A simple copy, without indexing is nearly 1 sec.
In [75]: timeit x.copy()
980 ms ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
and full copy with advanced indexing:
In [77]: timeit x[np.arange(x.shape[0])[:, None, None], np.arange(x.shape[1])[:,
...: None], np.arange(x.shape[2])]
1.47 s ± 37.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
trying the idx
again:
In [78]: timeit result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
1.71 s ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Keep in mind that when the operations start using nearly all the memory, and/or start requiring swapping and special memory requests to the OS, timings can really go to pot.
edit
You don't need the two step process. Just use partition
:
out = np.partition(x, -k, axis=1)[:, -k:]
This is the same as result
, and takes the same time as the idx
step.
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.