Wednesday, July 13, 2022

[FIXED] compare value in two rows in a column pandas

July 13, 2022 compare, pandas No comments

Issue

I have a pandas df something like this:

           color          pct               days               text
  1         red            5                 7                 good
  2         red           10                30                 good
  3         red           11                60                  bad
  4         blue           6                 7                  bad
  5         blue          15                30                 good
  6         blue          21                60                  bad
  7        yellow          2                 7                 good
  8        yellow          5                30                  bad
  9        yellow          7                60                  bad

So basically, for each color, I have percentage values for 7 days, 30 days and 60 days. Please note that these are not always in correct order as I gave in example above. My task now is to look at the change in percentage for each color between the consecutive days values and if the change is greater or equal to 5%, then write in column "text" as "NA". Text in days 7 category is default and cannot be overwritten.

Desired result:

           color          pct               days               text
  1         red            5                 7                 good
  2         red           10                30                  NA
  3         red           11                60                  bad
  4         blue           6                 7                  bad
  5         blue          15                30                  NA
  6         blue          21                60                  NA
  7        yellow          2                 7                 good
  8        yellow          5                30                  bad
  9        yellow          7                60                  bad

I am able to achieve this by a very very long process that I am very sure is not efficient. I am sure there is a much better way of doing this, but I am new to python, so struggling. Can someone please help me with this? Many thanks in advance

Solution

A variation on a (now-deleted) suggested answer as comment:

# ensure numeric data
df['pct'] = pd.to_numeric(df['pct'], errors='coerce')
df['days'] = pd.to_numeric(df['days'], errors='coerce')

# update in place
df.loc[df.sort_values(['color','days'])
         .groupby('color')['pct']
         .diff().ge(5), 'text'] = 'NA'

Output:

    color  pct  days  text
1     red    5     7  good
2     red   10    30    NA
3     red   11    60   bad
4    blue    6     7   bad
5    blue   15    30    NA
6    blue   21    60    NA
7  yellow    2     7  good
8  yellow    5    30   bad
9  yellow    7    60   bad

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 13, 2022

[FIXED] compare value in two rows in a column pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels