Issue
This is my DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'a': [98, 97, 100, 101, 103, 110, 108, 109, 130, 135],
'b': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
'c': [np.nan, np.nan, 1.0, 1.0, 1.0, 2.0, 2.0, 2.0, 3.0, 3.0],
'd': [92, 92, 92, 92, 92, 92, 92, 92, 92, 92],
}
)
And this is the expected output. I want to to create column x
:
a b c d x
0 98 3 NaN 92 92
1 97 3 NaN 92 92
2 100 3 1.0 92 94
3 101 3 1.0 92 94
4 103 3 1.0 92 94
5 110 3 2.0 92 104
6 108 3 2.0 92 104
7 109 3 2.0 92 104
8 130 3 3.0 92 124
9 135 3 3.0 92 124
Steps:
a) When c
is not duplicated, df['x'] = df.a - (df.b * 2)
b) If df.c == np.nan
, df['x'] = df.d
For example:
The first new value in c
is row 2
. So df['x'] = 100 - (3 * 2)
which is 94 and df['x'] = 94
until a new value in c
appears which is row 5
. For row 5
, df['x'] = 110 - (3 * 2)
which is 104. And the logic continues.
This is what I have tried:
df['x'] = df.a - (df.b * 2)
df.loc[df.c.isna(), 'x'] = df.d
df['x'] = df.x.cummax()
Solution
You can use duplicated
, mask
, grouby.ffill
and fillna
:
# identify duplicated "c"
m = df['c'].duplicated()
# compute a-(2*b)
# mask the duplicated "c"
# ffill per group
# replace NaN with "d"
df['x'] = (df['a'].sub(df['b'] * 2)
.mask(m)
.groupby(df['c']).ffill()
.fillna(df['d'])
)
Variant to work by groups of successive identical "c":
g = df['c'].ne(df['c'].shift()).cumsum()
m = g.duplicated()
df['x'] = (df['a'].sub(df['b'] * 2)
.mask(m)
.groupby(m1.cumsum()).ffill()
.where(df['c'].notna(), df['d'])
)
Output:
a b c d x
0 98 3 NaN 92 92.0
1 97 3 NaN 92 92.0
2 100 3 1.0 92 94.0
3 101 3 1.0 92 94.0
4 103 3 1.0 92 94.0
5 110 3 2.0 92 104.0
6 108 3 2.0 92 104.0
7 109 3 2.0 92 104.0
8 130 3 3.0 92 124.0
9 135 3 3.0 92 124.0
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.