Issue
In numpy, how to efficiently build a mapping from each unique value to its indices, without using a for loop
I considered the following alternatives, but they are not efficient enough for my use case because I use large arrays.
The first alternative, requires traversing the array with for
loop, which may be slow when considering large numpy arrays.
import numpy as np
from collections import defaultdict
a = np.array([1, 2, 6, 4, 2, 3, 2])
inv = defaultdict(list)
for i, x in enumerate(a):
inv[x].append(i)
The second alternative is non-efficient because it requires travesing the array multiple times:
import numpy as np
a = np.array([1, 2, 6, 4, 2, 3, 2])
inv = {}
for x in np.unique(a):
inv[x] = np.flatnonzero(a == x)
EDIT: My numpy array consists of integers and the usage is for image segmentation. I was also looking for a method in skimage, but did not find any.
Solution
This should work.
a = np.array((1, 2, 6, 2, 4, 7, 25, 6))
fwd = np.argsort(a)
asorted = a[fwd]
keys = np.unique(asorted)
lower = np.searchsorted(asorted, keys)
# or higher = itertools.chain(lower[1:], (len(asorted),))
higher = np.append(lower[1:], len(asorted))
inv = {key: fwd[lower_i:higher_i]
for key, lower_i, higher_i
in zip(keys, lower, higher)}
assert all(np.all(a[indices] == key)
for key, indices in inv.items())
It runs in something like O(n log(n)). The only loop that remains is the one to build a dictionary. That step is optional, of course.
From a purely algorithmic standpoint, your first approach (defaultdict(list)) would be better since it runs in aggregated linear time but of course the python overhead may be significant.
Answered By - Homer512
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.