Tuesday, October 19, 2021

[FIXED] Dynamic comparison of values of n multiple Pandas columns

October 19, 2021 dynamic, multiple-columns, multipleselection, pandas, python No comments

Issue

Let's say a user can input the columns and values to compare for a DF, so we can have:

column_list = ['col1', 'col2', 'col3']
value_list = [val1, val2, val3]

So to select the rows that satisfy where col1 >= val1 AND col2 >= val2 AND col3 >= val3 we would write:

selection = (df['col1'] >= val1) & (df['col2'] >= val2) & (df['col3'] >= val3))

or it can be in the form:

selection  = df.loc[(df['col1'] >= val1) & (df['col2'] >= val2) & (df['col3'] >= val3)]

The number of columns is not known in advance, so we can have n columns. We can try this approach:

if n=1:
   selection = (df['col1'] >= val1))
elif n=2:
   selection = (df['col1'] >= val1) & (df['col2'] >= val2))
elif n=3:
   selection = (df['col1'] >= val1) & (df['col2'] >= val2) & (df['col3'] >= val3))

But this is neither scalable nor efficient. I tried by generating strings "df['col<>'] >= val<>)" with a foor loop given the input lists but it didn't work for Pandas because of the str format.

What would be the best pythonic approach for this? To avoid having all the options with if and else statements.

Thank you in advance!

Solution

To perform a comparison with the same operator for all columns, create a Series with the values and columns ids and use it to perform an aligned comparison with the dataframe:

df[df.gt(pd.Series(value_list, index=column_list)).all(1)]

Example input:

>>> value_list
[3, 7, 11]
>>> df
   col1  col2  col3
0     0     1     2
1     3     4     5
2     6     7     8
3     9    10    11
4    12    13    14

output:

   col1  col2  col3
4    12    13    14

intermediates:

>>> pd.Series(value_list, index=column_list)
col1     3
col2     7
col3    11

>>> df.gt(pd.Series(value_list, index=column_list))
    col1   col2   col3
0  False  False  False
1  False  False  False
2   True  False  False
3   True   True  False
4   True   True   True

>>> df.gt(pd.Series(value_list, index=column_list)).all(1)
0    False
1    False
2    False
3    False
4     True

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 19, 2021

[FIXED] Dynamic comparison of values of n multiple Pandas columns

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels