Issue
I have an two arrays of 1's and 0's:
a = [1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
b = [0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1]
I want to make sure that the "1" always "jumps" the array as I go from left to right never appearing in the same array twice in a row before appearing in the other array.
a = [1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
b = [0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
I can do it using pandas and iteration:
df = pd.DataFrame({"A": a, "B": b, })
df2 = df[(df.A > 0) | (df.B > 0)]
i = 0
for idx in df2.index:
try:
if df2.at[idx, 'A'] == df2.at[df2.index[i + 1], 'A']:
df.at[idx, 'A'] = 0
if df2.at[idx, 'B'] == df2.at[df2.index[i + 1], 'B']:
df.at[idx, 'B'] = 0
i += 1
except IndexError:
pass
But it is not efficient. How can I vectorize it to make it faster?
Solution
IIUC, try:
df2 = df[(df.A > 0) | (df.B > 0)]
df2 = df2[df2.B != df2.B.shift(-1)]
df.loc[~df.index.isin(df2.index)] = 0
print(df)
Prints:
A B
0 1 0
1 0 1
2 0 0
3 1 0
4 0 1
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 1 0
13 0 0
14 0 0
15 0 0
16 0 0
17 0 0
18 0 0
19 0 1
EDIT: Step by-step explanation:
Keep only indices where there is 1
value anywhere in row:
df2 = df[(df.A > 0) | (df.B > 0)]
print(df2)
Prints:
A B
0 1 0
1 0 1
3 1 0
4 0 1
9 1 0
16 0 1
19 0 1
We see that we want only the indices where there's a alternating changes between A and B (so we need to get rid of the index 16
here).
We know that B
must alternate, so we shift the B
column and compare:
df2["tmp"] = df2.B != df2.B.shift(-1)
A B tmp
0 1 0 True
1 0 1 True
3 1 0 True
4 0 1 True
9 1 0 True
16 0 1 False
19 0 1 True
Index 16
has False
value, so we keep only the other indices:
df2 = df2[df2.B != df2.B.shift(-1)]
df.loc[~df.index.isin(df2.index)] = 0
print(df)
Prints the final df.
Answered By - Andrej Kesely
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.