Thursday, December 28, 2023

[FIXED] Finding the first row that meets conditions of a mask and selecting one row after it that meets a condition

December 28, 2023 dataframe, pandas, python No comments

Issue

This is an extension to this post.

My dataframe is:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [100, 1123, 123, 100, 1, 0, 1],
        'b': [1000, 11123, 1123, 0, 55, 0, 1],
        'c': [100, 1123, 123, 999, 11, 50, 1],
        'd': [100, 1123, 123, 190, 1, 105, 1],
        'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
    }
)

And this is the output that I want. I need to create column x:

      a      b     c     d  e   x
0   100   1000   100   100  a   NaN
1  1123  11123  1123  1123  b   NaN
2   123   1123   123   123  c   NaN
3   100      0   999   190  d   NaN
4     1     55    11     1  e   NaN
5     0      0    50   105  f   f
6     1      1     1     1  g   NaN

My mask is:

mask = (df.a > df.b)

And these are the steps needed:

a) Find the first row that meets conditions of the mask.

b) Get the value of column a of the above step.

c) Find the first row that the above value is between columns c and d. Being equal to one of them is also OK.

d) Get the value in column e and create column x.

For example for the above dataframe:

a) First row of mask is row 3.

b) The value of column a is 100.

c) From rows that are after the mask (4, 5, ...) the first row that 100 is between columns c and d is row 5. So 'f' is selected for column x.

d) So 'f' is chosen for column x.

This image clarifies the above steps:

This is what I have tried:

mask = (df.a > df.b)
val = df.loc[mask.cumsum().eq(1) & mask, 'a']

I prefer the solution to be generic in any possible way.Like this answer IF POSSIBLE.

I have provided some additional dataframes in case you need to test the code with other subtle different conditions. For instance what if there no rows that meets conditions of the mask. In that case column x is all NaNs. Column names are all the same as the above df.

df = pd.DataFrame({'a': [100, 1123, 123, -1, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 1], 'd': [100, 1123, 123, 190, 1, 105, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [100, 1123, 123, 100, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': [100, 1123, 123, 999, 11, -1, 1], 'd': [100, 1123, 123, 190, 1, 10, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [100, 1123, 123, 1, 1, 0, 100], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': [100, 1123, 123, 999, 11, -1, 50], 'd': [100, 1123, 123, 190, 1, 10, 101], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})
df = pd.DataFrame({'a': [100, 1123, 123, 100, 1, 1000, 1],'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 500], 'd': [100, 1123, 123, 190, 1, 105, 2000], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})

Solution

Code

import numpy as np

target = df.loc[(df.a > df.b).cummax().cumsum().eq(1), 'a']
df['x'] = target
df['x'] = df['x'].ffill()
cond = df['x'].between(df['c'], df['d']) & df['x'].notna()
df['x'] = np.where(cond.cummax().cumsum().eq(1), df['e'], float('nan'))

target value is 100 & The first row's c is 100 and d is 100.(it means c & d equal target value 100)

Judging from the fact that the value of the e column of the first row was not made into the value of the x column,I found a matching condition below the 4th row (index: 3), which is the target.

chk with your test code

df = pd.DataFrame({'a': [100, 1123, 123, -1, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 1], 'd': [100, 1123, 123, 190, 1, 105, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})

apply code and return:

Answered By - Panda Kim

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, December 28, 2023

[FIXED] Finding the first row that meets conditions of a mask and selecting one row after it that meets a condition

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels