Issue
For a quick backstory: I'm working with temperature data. If it's above a threshold, it's assigned '1'; if below a (separate) threshold, it's '-1'; if it's in between, it's '0'. I want to count above-threshold days and below-threshold days.
array([ 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, -1, -1,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, 0])
I am using the code below to try to count:
aok = year['threshold'].values
for i in range(1, len(aok)):
if aok[i] == 1:
aok[i] += aok[i - 1]
if aok[i] == -1:
aok[i] += aok[i - 1]
It results in the following:
array([ 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -2, -3, -4, -5,
0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 0])
It works fine in the first two rows, counting consecutive 1's and -1's respectively. However, I ran into a problem in the third row, when 1's and -1's were together. I do not want the -1's and 1's to effect one another. The following is what I want:
array([ 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -2, -3, -4, -5,
0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -2, -3, 0])
Do I need some additional statements in my for-loop to safeguard against the numbers mixing? Something like:
if aok[i] == -1 & aok[i-1] == 1:
pass
if aok[i] == 1 & aok[i-1] == -1:
pass
Any help/insight would be appreciated.
Solution
This can be done purely with numpy. Basically we just need to implement a cumulative sum, which resets when a 0
value is hit. First we take the cumsum:
arr = np.array([ 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, -1, -1, -1,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, 0])
cms = arr.cumsum()
# array([ 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
# 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 12, 11, 10, 9, 8,
# 8, 8, 8, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 16, 15, 14, 14])
Then we filter arr
based on 0
values, and we calculate the difference between consecutive elements. This will be the correction for the previous cumsum on those specific items.
correction = np.diff(np.hstack(((0,), cms[arr == 0])))
After that, we copy arr
and apply correction:
arr_copy = np.copy(arr)
arr_copy[arr == 0] -= correction
# array([ 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
# 1, 1, 1, 1, -13, 0, 0, 0, 0, 0, 0, 0, 0,
# 0, 0, 0, -1, -1, -1, -1, -1, 5, 0, 0, 0, 1,
# 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -6])
Finally we take the cumsum of arr_copy
to get the solution:
arr_copy.cumsum()
# array([ 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
# 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -2, -3, -4, -5,
# 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 8, 7, 6, 0])
Answered By - Péter Leéh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.