Issue
This is an extension to this post.
My dataframe is:
import pandas as pd
df = pd.DataFrame(
{
'a': [100, 1123, 123, 100, 1, 0, 1],
'b': [1000, 11123, 1123, 0, 55, 0, 1],
'c': [100, 1123, 123, 999, 11, 50, 1],
'd': [100, 1123, 123, 190, 1, 105, 1],
'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
}
)
And this is the output that I want. I need to create column x
:
a b c d e x
0 100 1000 100 100 a NaN
1 1123 11123 1123 1123 b NaN
2 123 1123 123 123 c NaN
3 100 0 999 190 d NaN
4 1 55 11 1 e NaN
5 0 0 50 105 f f
6 1 1 1 1 g NaN
My mask is:
mask = (df.a > df.b)
And these are the steps needed:
a) Find the first row that meets conditions of the mask.
b) Get the value of column a
of the above step.
c) Find the first row that the above value is between columns c
and d
. Being equal to one of them is also OK.
d) Get the value in column e
and create column x
.
For example for the above dataframe:
a) First row of mask is row 3
.
b) The value of column a
is 100.
c) From rows that are after the mask (4, 5, ...) the first row that 100 is between columns c
and d
is row 5. So 'f' is selected for column x
.
d) So 'f' is chosen for column x
.
This image clarifies the above steps:
This is what I have tried:
mask = (df.a > df.b)
val = df.loc[mask.cumsum().eq(1) & mask, 'a']
I prefer the solution to be generic in any possible way.Like this answer IF POSSIBLE.
I have provided some additional dataframes in case you need to test the code with other subtle different conditions. For instance what if there no rows that meets conditions of the mask. In that case column x
is all NaN
s. Column names are all the same as the above df
.
df = pd.DataFrame({'a': [100, 1123, 123, -1, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 1], 'd': [100, 1123, 123, 190, 1, 105, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [100, 1123, 123, 100, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': [100, 1123, 123, 999, 11, -1, 1], 'd': [100, 1123, 123, 190, 1, 10, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [100, 1123, 123, 1, 1, 0, 100], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': [100, 1123, 123, 999, 11, -1, 50], 'd': [100, 1123, 123, 190, 1, 10, 101], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [100, 1123, 123, 100, 1, 1000, 1],'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 500], 'd': [100, 1123, 123, 190, 1, 105, 2000], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
Solution
Code
import numpy as np
target = df.loc[(df.a > df.b).cummax().cumsum().eq(1), 'a']
df['x'] = target
df['x'] = df['x'].ffill()
cond = df['x'].between(df['c'], df['d']) & df['x'].notna()
df['x'] = np.where(cond.cummax().cumsum().eq(1), df['e'], float('nan'))
df
target value is 100 & The first row's c is 100 and d is 100.(it means c & d equal target value 100)
Judging from the fact that the value of the e column of the first row was not made into the value of the x column,I found a matching condition below the 4th row (index: 3), which is the target.
chk with your test code
df = pd.DataFrame({'a': [100, 1123, 123, -1, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 1], 'd': [100, 1123, 123, 190, 1, 105, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
apply code and return:
Answered By - Panda Kim
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.