Sunday, January 7, 2024

[FIXED] reset running count based on 1's and 0's of column

January 07, 2024 dataframe, jupyter-notebook, numpy, pandas, python No comments

Issue

data = pd.DataFrame(
 {"categ": [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1],
 "value":  [0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1],
 "Run_count":  [0, 1, 1, 0, 0, 2, 0, 3, 0, 0, 0, 4, 4, 0, 0, 0, 0, 5, 5, 0, 0, 6],
 "currentResults":  [0, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 0, 0, 5, 5, 5, 5, 6, 6, 6],
 "desiredResults":  [0, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 0, 0, 0, 0, 1, 1, 2, 2, 2],
 })

data['Run_count'] = data['value'].where(data['value'].eq(1)).dropna().groupby(data['value'].where(data['value'].eq(1)).isna().cumsum()).ngroup()+1
data['currentResults'] = np.where(data['categ']==1,data['Run_count'].bfill(),0)

I am able to accomplish most of what I am trying to do aka what is in data['currentResults']

what I am trying to accomplish is at id=15 where df['categ'] == 1, I would like to restart the count again in data['desiredCount']. Since df['categ'][13:14], they have the value of 0. Currently at id = 15, data['currentResults'] continues the count at "5", where I want to restart/reset the count at 1

I was thinking the easiest way to do accomplish might be to reset/restart the counter in the "Run_count" column based on my current code.

attached is an image below of what i'm trying to accomlish in "desiredResults"

Solution

If I understand correctly, you can set up boolean Series, then perform a custom groupby.transform:

m1 = data['value'].ne(1)
m2 = data['categ'].ne(1)

data['out'] = (m1[~m2]
               .groupby(m2.cumsum())
               # increment 1 for each non-1 following a 1
               .transform(lambda g: (g&~g.shift(fill_value=False)).cumsum()
               # add 1 if group starts with non-1 
                                    +(1-g.iloc[0]))
               .reindex(data.index, fill_value=0) # 0 if categ is non-1
              )

Output:

    categ  value  out
0       0      0    0
1       1      1    1
2       1      1    1
3       1      0    2
4       1      0    2
5       1      1    2
6       1      0    3
7       1      1    3
8       1      0    4
9       1      0    4
10      1      0    4
11      1      1    4
12      1      1    4
13      0      0    0
14      0      0    0
15      1      0    1
16      1      0    1
17      1      1    1
18      1      1    1
19      1      0    2
20      1      0    2
21      1      1    2

Intermediates:

    categ  value     m1     m2  m2.cumsum() g&~g_shift  out
0       0      0   True   True            1        NaN    0
1       1      1  False  False            1      False    1
2       1      1  False  False            1      False    1
3       1      0   True  False            1       True    2
4       1      0   True  False            1      False    2
5       1      1  False  False            1      False    2
6       1      0   True  False            1       True    3
7       1      1  False  False            1      False    3
8       1      0   True  False            1       True    4
9       1      0   True  False            1      False    4
10      1      0   True  False            1      False    4
11      1      1  False  False            1      False    4
12      1      1  False  False            1      False    4
13      0      0   True   True            2        NaN    0
14      0      0   True   True            3        NaN    0
15      1      0   True  False            3       True    1
16      1      0   True  False            3      False    1
17      1      1  False  False            3      False    1
18      1      1  False  False            3      False    1
19      1      0   True  False            3       True    2
20      1      0   True  False            3      False    2
21      1      1  False  False            3      False    2

updated question

For your updated question, it looks like you could just change the m1 condition and remove the +(1-g.iloc[0])) correction factor:

m1 = data['value'].eq(1)
m2 = data['categ'].ne(1)

data['out'] = (m1[~m2]
               .groupby(m2.cumsum())
               # increment 1 for each non-1 following a 1
               .transform(lambda g: (g&~g.shift(fill_value=False)).cumsum())
               .reindex(data.index, fill_value=0) # 0 if categ is non-1
              )

Output:

    categ  value  desiredResults  out
0       0      0               0    0
1       1      1               1    1
2       1      1               1    1
3       1      0               1    1
4       1      0               1    1
5       1      1               2    2
6       1      0               2    2
7       1      1               3    3
8       1      0               3    3
9       1      0               3    3
10      1      0               3    3
11      1      1               4    4
12      1      1               4    4
13      0      0               0    0
14      0      0               0    0
15      1      0               0    0
16      1      0               0    0
17      1      1               1    1
18      1      1               1    1
19      1      0               2    1  # this is however different!
20      1      0               2    1
21      1      1               2    2

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, January 7, 2024

[FIXED] reset running count based on 1's and 0's of column

Issue

Solution

updated question

0 comments:

Post a Comment

Popular Posts

Labels