Wednesday, January 12, 2022

[FIXED] Faster Indexing of 3D NumPy Array From np.argpartition Index

January 12, 2022 arrays, numpy, python No comments

Issue

I have a large 3D NumPy array:

x = np.random.rand(1_000_000_000).reshape(500, 1000, 2000)

And for each of the 500 2D arrays, I need to keep only the largest 800 elements within each column of each 2D array. To avoid costly sorting, I decided to use np.argpartition:

k = 800
idx = np.argpartition(x, -k, axis=1)[:, -k:]
result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]

While np.argpartition is reasonably fast, using idx to index back into x is really slow. Is there a faster (and memory efficient) way to perform this indexing?

Note that the results do not need to be in ascending sorted order. They just need to be the top 800

Solution

cutting the size by 10 to fit my memory, here are times for the various steps:

Creationg:

In [65]: timeit x = np.random.rand(1_000_000_00).reshape(500, 1000, 200)
1.89 s ± 82 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [66]: x = np.random.rand(1_000_000_00).reshape(500, 1000, 200)
In [67]: k=800

sort:

In [68]: idx = np.argpartition(x, -k, axis=1)[:, -k:]
In [69]: timeit idx = np.argpartition(x, -k, axis=1)[:, -k:]

2.52 s ± 292 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

the indexing:

In [70]: result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
In [71]: timeit result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
The slowest run took 4.11 times longer than the fastest. This could mean that an intermediate result is being cached.
2.6 s ± 1.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

All three steps take about the same time. I don't see anything unusual about the last indexing. This .8 GB.

A simple copy, without indexing is nearly 1 sec.

In [75]: timeit x.copy()
980 ms ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

and full copy with advanced indexing:

In [77]: timeit x[np.arange(x.shape[0])[:, None, None], np.arange(x.shape[1])[:,
    ...: None], np.arange(x.shape[2])]
1.47 s ± 37.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

trying the idx again:

In [78]: timeit result = x[np.arange(x.shape[0])[:, None, None], idx, np.arange(x.shape[2])]
1.71 s ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Keep in mind that when the operations start using nearly all the memory, and/or start requiring swapping and special memory requests to the OS, timings can really go to pot.

edit

You don't need the two step process. Just use partition:

out = np.partition(x, -k, axis=1)[:, -k:]

This is the same as result, and takes the same time as the idx step.

Answered By - hpaulj

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, January 12, 2022

[FIXED] Faster Indexing of 3D NumPy Array From np.argpartition Index

Issue

Solution

edit

0 comments:

Post a Comment

Popular Posts

Labels