Issue
Imagine I have a numpy array and I need to find the spans/ranges where that condition is True. For example, I have the following array in which I'm trying to find spans where items are greater than 1:
[0, 0, 0, 2, 2, 0, 2, 2, 2, 0]
I would need to find indices (start, stop):
(3, 5)
(6, 9)
The fastest thing I've been able to implement is making a boolean array of:
truth = data > threshold
and then looping through the array using numpy.argmin
and numpy.argmax
to find start and end positions.
pos = 0
truth = container[RATIO,:] > threshold
while pos < len(truth):
start = numpy.argmax(truth[pos:]) + pos + offset
end = numpy.argmin(truth[start:]) + start + offset
if not truth[start]:#nothing more
break
if start == end:#goes to the end
end = len(truth)
pos = end
But this has been too slow for the billions of positions in my arrays and the fact that the spans I'm finding are usually just a few positions in a row. Does anyone know a faster way to find these spans?
Solution
How's one way. First take the boolean array you have:
In [11]: a
Out[11]: array([0, 0, 0, 2, 2, 0, 2, 2, 2, 0])
In [12]: a1 = a > 1
Shift it one to the left (to get the next state at each index) using roll
:
In [13]: a1_rshifted = np.roll(a1, 1)
In [14]: starts = a1 & ~a1_rshifted # it's True but the previous isn't
In [15]: ends = ~a1 & a1_rshifted
Where this is non-zero is the start of each True batch (or, respectively, end batch):
In [16]: np.nonzero(starts)[0], np.nonzero(ends)[0]
Out[16]: (array([3, 6]), array([5, 9]))
And zipping these together:
In [17]: zip(np.nonzero(starts)[0], np.nonzero(ends)[0])
Out[17]: [(3, 5), (6, 9)]
Answered By - Andy Hayden
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.