Issue
In a pylab program (which could probably be a matlab program as well) I have a numpy array of numbers representing distances: d[t]
is the distance at time t
(and the timespan of my data is len(d)
time units).
The events I'm interested in are when the distance is below a certain threshold, and I want to compute the duration of these events. It's easy to get an array of booleans with b = d<threshold
, and the problem comes down to computing the sequence of the lengths of the True-only words in b
. But I do not know how to do that efficiently (i.e. using numpy primitives), and I resorted to walk the array and to do manual change detection (i.e. initialize counter when value goes from False to True, increase counter as long as value is True, and output the counter to the sequence when value goes back to False). But this is tremendously slow.
How to efficienly detect that sort of sequences in numpy arrays ?
Below is some python code that illustrates my problem : the fourth dot takes a very long time to appear (if not, increase the size of the array)
from pylab import *
threshold = 7
print '.'
d = 10*rand(10000000)
print '.'
b = d<threshold
print '.'
durations=[]
for i in xrange(len(b)):
if b[i] and (i==0 or not b[i-1]):
counter=1
if i>0 and b[i-1] and b[i]:
counter+=1
if (b[i-1] and not b[i]) or i==len(b)-1:
durations.append(counter)
print '.'
Solution
While not numpy
primitives, itertools
functions are often very fast, so do give this one a try (and measure times for various solutions including this one, of course):
def runs_of_ones(bits):
for bit, group in itertools.groupby(bits):
if bit: yield sum(group)
If you do need the values in a list, just can use list(runs_of_ones(bits)), of course; but maybe a list comprehension might be marginally faster still:
def runs_of_ones_list(bits):
return [sum(g) for b, g in itertools.groupby(bits) if b]
Moving to "numpy-native" possibilities, what about:
def runs_of_ones_array(bits):
# make sure all runs of ones are well-bounded
bounded = numpy.hstack(([0], bits, [0]))
# get 1 at run starts and -1 at run ends
difs = numpy.diff(bounded)
run_starts, = numpy.where(difs > 0)
run_ends, = numpy.where(difs < 0)
return run_ends - run_starts
Again: be sure to benchmark solutions against each others in realistic-for-you examples!
Answered By - Alex Martelli
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.