Issue
This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'a': [98, 97, 100, 135, 103, 100, 105, 109, 130],
'b': [100, 103, 101, 105, 110, 120, 101, 150, 160]
}
)
And this is the desired output. I want to create column c
:
a b c
0 98 100 100
1 97 103 100
2 100 101 100
3 135 105 100
4 103 110 110
5 100 120 110
6 105 101 101
7 109 150 150
8 130 160 150
It is not so easy for me to describe the issue in pure English, since it is a little bit complicated.c
is df.b.cummmin()
but under certain conditions it changes. I describe it row by row:
The process starts with:
df['c'] = df.b.cummin()
The condition that changes c
is :
cond = df.a.shift(1) > df.c.shift(1)
Now the rows that matter are the ones that cond == True
. For these rows df.c = df.b
And the cummin()
of b
RESETS.
For example, the first instance of cond
is row 4
. So c
changes to 110 (in other words, whatever b
is). And for row 5
it is the cummin()
of b
from row 4
. The logic is the same to the end.
This is one of my attempts. But it does not work where the cond
kicks in:
df['c'] = df.b.cummin()
df.loc[df.a.shift(1) > df.c.shift(1), 'c'] = df.b
PS:
The accepted answer works for this example. But for my real data which is way bigger than this, it didn't work as expected. I still haven't found the problem with it.
However I really like this answer and it works for me. It is simple and very effective. I didn't change the accepted answer because it has a vectorized solution.
Solution
IIUC, you can try :
m1 = df["b"].le(df["a"].shift())
cm = df["b"].groupby(m1.cumsum()).cummin()
m2 = (df["b"].le(cm) | df["a"].shift().le(cm.shift()))
df["c"] = cm.where(m2, df["b"].mask(m2).cummin())
Output (including intermediates) :
a b c a_s m1 cm m2 c
0 98 100 100 NaN False 100 True 100
1 97 103 100 98.00 False 100 True 100
2 100 101 100 97.00 False 100 True 100
3 135 105 100 100.00 False 100 True 100
4 103 110 110 135.00 True 110 True 110
5 100 120 110 103.00 False 110 True 110
6 105 101 101 100.00 False 101 True 101
7 109 150 150 105.00 False 101 False 150
8 130 160 150 109.00 False 101 False 150
[9 rows x 8 columns]
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.