Saturday, January 6, 2024

[FIXED] Creating a new column when values in another column is not duplicate

January 06, 2024 dataframe, pandas, python No comments

Issue

This is my DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
       'a': [98, 97, 100, 101, 103, 110, 108, 109, 130, 135],
       'b': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       'c': [np.nan, np.nan, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 3.0, 3.0],
       'd': [92, 92, 92, 92, 92, 92, 92, 92, 92, 92],
    }
)

And this is the expected output. I want to to create column x:

     a  b    c   d    x
0   98  3  NaN  92   92
1   97  3  NaN  92   92
2  100  3  1.0  92   94
3  101  3  1.0  92   94
4  103  3  1.0  92   94
5  110  3  2.0  92  104
6  108  3  2.0  92  104
7  109  3  2.0  92  104
8  130  3  3.0  92  124
9  135  3  3.0  92  124

Steps:

a) When c is not duplicated, df['x'] = df.a - (df.b * 2)

b) If df.c == np.nan, df['x'] = df.d

For example:

The first new value in c is row 2. So df['x'] = 100 - (3 * 2) which is 94 and df['x'] = 94 until a new value in c appears which is row 5. For row 5, df['x'] = 110 - (3 * 2) which is 104. And the logic continues.

This is what I have tried:

df['x'] = df.a - (df.b * 2)
df.loc[df.c.isna(), 'x'] = df.d
df['x'] = df.x.cummax()

Solution

You can use duplicated, mask, grouby.ffill and fillna:

# identify duplicated "c"
m = df['c'].duplicated()

# compute a-(2*b)
# mask the duplicated "c"
# ffill per group
# replace NaN with "d"
df['x'] = (df['a'].sub(df['b'] * 2)
           .mask(m)
           .groupby(df['c']).ffill()
           .fillna(df['d'])
          )

Variant to work by groups of successive identical "c":

g = df['c'].ne(df['c'].shift()).cumsum()
m = g.duplicated()
df['x'] = (df['a'].sub(df['b'] * 2)
           .mask(m)
           .groupby(m1.cumsum()).ffill()
           .where(df['c'].notna(), df['d'])
          )

Output:

     a  b    c   d      x
0   98  3  NaN  92   92.0
1   97  3  NaN  92   92.0
2  100  3  1.0  92   94.0
3  101  3  1.0  92   94.0
4  103  3  1.0  92   94.0
5  110  3  2.0  92  104.0
6  108  3  2.0  92  104.0
7  109  3  2.0  92  104.0
8  130  3  3.0  92  124.0
9  135  3  3.0  92  124.0

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 6, 2024

[FIXED] Creating a new column when values in another column is not duplicate

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels