Issue
My DataFrame is:
import pandas as pd
df = pd.DataFrame(
{
'a': [20, 9, 31, 40],
'b': [1, 10, 17, 30],
}
)
Expected output: Creating column c
a b c
0 20 1 20
1 9 10 20
2 31 17 17
3 40 30 31
Steps:
c
is the maximum value between df.b
and df.a.shift(1).bfill()
.
My attempt:
df['temp'] = df.a.shift(1).bfill()
df['c'] = df[['temp', 'b']].max(axis=1)
Is it the cleanest way / best approach?
Solution
If you don't want the temporary column, then you can replace values on the shifted column using where()
in a one-liner.
df['c'] = df['a'].shift(1).bfill().where(lambda x: x>df['b'], df['b'])
This is similar to the combine()
method posted in the other answer, but this one does a vectorized comparison while, combine()
does it element-wise so this should be much faster as the length of the dataframe increase.
Answered By - cottontail
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.