Issue
I want to replace values in a 2D numpy array based on following dictionary in python:
code region
334 0
4 22
8 31
12 16
16 17
24 27
28 18
32 21
36 1
I want to find cells in numpy
2D array which match code
and replace by corresponding value in region
column. The issue is that this will result in replacing code = 12
by region = 16
and in the next line, all cells with value of 16 (including the ones which just got assigned a value of 16) will be replaced by a value of 17. How do I prevent that?
Solution
Here's a vectorized one based on np.searchsorted
to trace back the locations for each of those keys in the array and then replacing and please excuse the almost sexist function name here (couldn't help it though) -
def replace_with_dict(ar, dic):
# Extract out keys and values
k = np.array(list(dic.keys()))
v = np.array(list(dic.values()))
# Get argsort indices
sidx = k.argsort()
# Drop the magic bomb with searchsorted to get the corresponding
# places for a in keys (using sorter since a is not necessarily sorted).
# Then trace it back to original order with indexing into sidx
# Finally index into values for desired output.
return v[sidx[np.searchsorted(k,ar,sorter=sidx)]]
Sample run -
In [82]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
...:
...: np.random.seed(0)
...: a = np.random.choice(dic.keys(), 20)
...:
In [83]: a
Out[83]:
array([ 28, 16, 32, 32, 334, 32, 28, 4, 8, 334, 12, 36, 36,
24, 12, 334, 334, 36, 24, 28])
In [84]: replace_with_dict(a, dic)
Out[84]:
array([18, 17, 21, 21, 0, 21, 18, 22, 31, 0, 16, 1, 1, 27, 16, 0, 0,
1, 27, 18])
Improvement
A faster one for big arrays would be sort the values and keys arrays and then use searchsorted
without sorter
, like so -
def replace_with_dict2(ar, dic):
# Extract out keys and values
k = np.array(list(dic.keys()))
v = np.array(list(dic.values()))
# Get argsort indices
sidx = k.argsort()
ks = k[sidx]
vs = v[sidx]
return vs[np.searchsorted(ks,ar)]
Runtime test -
In [91]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
...:
...: np.random.seed(0)
...: a = np.random.choice(dic.keys(), 20000)
In [92]: out1 = replace_with_dict(a, dic)
...: out2 = replace_with_dict2(a, dic)
...: print np.allclose(out1, out2)
True
In [93]: %timeit replace_with_dict(a, dic)
1000 loops, best of 3: 453 µs per loop
In [95]: %timeit replace_with_dict2(a, dic)
1000 loops, best of 3: 341 µs per loop
Generic case when all array elements are not in dictionary
If all elements in the input array are not guaranteed to be in the dictionary, we need a bit more work as listed below -
def replace_with_dict2_generic(ar, dic, assume_all_present=True):
# Extract out keys and values
k = np.array(list(dic.keys()))
v = np.array(list(dic.values()))
# Get argsort indices
sidx = k.argsort()
ks = k[sidx]
vs = v[sidx]
idx = np.searchsorted(ks,ar)
if assume_all_present==0:
idx[idx==len(vs)] = 0
mask = ks[idx] == ar
return np.where(mask, vs[idx], ar)
else:
return vs[idx]
Sample run -
In [163]: dic ={334:0, 4:22, 8:31, 12:16, 16:17, 24:27, 28:18, 32:21, 36:1}
...:
...: np.random.seed(0)
...: a = np.random.choice(dic.keys(), (20))
...: a[-1] = 400
In [165]: a
Out[165]:
array([ 28, 16, 32, 32, 334, 32, 28, 4, 8, 334, 12, 36, 36,
24, 12, 334, 334, 36, 24, 400])
In [166]: replace_with_dict2_generic(a, dic, assume_all_present=False)
Out[166]:
array([ 18, 17, 21, 21, 0, 21, 18, 22, 31, 0, 16, 1, 1,
27, 16, 0, 0, 1, 27, 400])
Answered By - Divakar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.