Issue
I have a very large numpy array (containing up to a million elements) like the one below:
[0,1,6,5,1,2,7,6,2,3,8,7,3,4,9,8,5,6,11,10,6,7,12,11,7,
8,13,12,8,9,14,13,10,11,16,15,11,12,17,16,12,13,18,17,13,
14,19,18,15,16,21,20,16,17,22,21,17,18,23,22,18,19,24,23]
and a small dictionary map for replacing some of the elements in the above array
{4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0}
I would like to replace some of the elements according to the map above. The numpy array is really large, and only a small subset of the elements (occurring as keys in the dictionary) will be replaced with the corresponding values. What is the fastest way to do this?
Solution
I believe there's even more efficient method, but for now, try
from numpy import copy
newArray = copy(theArray)
for k, v in d.iteritems(): newArray[theArray==k] = v
Microbenchmark and test for correctness:
#!/usr/bin/env python2.7
from numpy import copy, random, arange
random.seed(0)
data = random.randint(30, size=10**5)
d = {4: 0, 9: 5, 14: 10, 19: 15, 20: 0, 21: 1, 22: 2, 23: 3, 24: 0}
dk = d.keys()
dv = d.values()
def f1(a, d):
b = copy(a)
for k, v in d.iteritems():
b[a==k] = v
return b
def f2(a, d):
for i in xrange(len(a)):
a[i] = d.get(a[i], a[i])
return a
def f3(a, dk, dv):
mp = arange(0, max(a)+1)
mp[dk] = dv
return mp[a]
a = copy(data)
res = f2(a, d)
assert (f1(data, d) == res).all()
assert (f3(data, dk, dv) == res).all()
Result:
$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f1(data,d)'
100 loops, best of 3: 6.15 msec per loop
$ python2.7 -m timeit -s 'from w import f1,f3,data,d,dk,dv' 'f3(data,dk,dv)'
100 loops, best of 3: 19.6 msec per loop
Answered By - kennytm
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.