Saturday, January 20, 2024

[FIXED] Find max row value using .shift and .apply in pandas

January 20, 2024 pandas, python No comments

Issue

I have the following data frame:

df = pd.DataFrame({'A': [2.001, 4.001, 8.001, 0.001],
                   'B': [2.001, 0.001, 0.001, 0.001],
                   'C': [11.001, 12.001, 11.001, 8.001],
                   'D': [12.001, 23.001, 12.001, 8.021],
                   'E': [11.001, 24.001, 18.001, 8.0031]})

I can find the max value (in each row) between columns A, B, E and E (shifted by -1) using the below-mentioned method:

df["e_shifted"] =  df["E"].shift(-1)
df.apply(lambda x: max(x['A'], x['B'], x['E'], x['e_shifted']),axis = 1)

But this creates a temporary column (i.e. e_shifted) in the dataframe.

How can .apply () and shift(-1) be used together without creating a temporary column?

For example, using the below code:

df.apply(lambda x: max(x['A'], x['B'], x['E'], x['E'].shift(-1)),axis = 1)

gives an error as below:

AttributeError: 'numpy.float64' object has no attribute 'shift'

As per the solution provided by @Corralien(shown below), the code handles the shifting of a single column:

out = df.assign(E_shift=df['E'].shift(-1))[['A', 'B', 'E', 'E_shift']].max(axis=1)

But can the solution provided by @Corralien be modified to handle the shifting of multiple columns?

For example:

out = df.assign(E_shift=df['E'].shift(-1), A_shift=df['A'].shift(-1))[['A', 'B', 'E', 'E_shift', 'A_Shift']].max(axis=1)

Doing so gives the following error:

KeyError: "['A_Shift'] not in index"

Solution (after @mozway pointed out the error in typo):

out = df.assign(E_shift=df['E'].shift(-1), A_shift=df['A'].shift(-1))[['A', 'B', 'E', 'E_shift', 'A_shift']].max(axis=1)

Solution

Rather than avoiding creating an extra column, you should avoid using apply on each row of your dataframe, which is very slow. In your case, you can use assign to create a temporary column that will not affect your original dataframe then use a vectorized max:

out = df.assign(E_shift=df['E'].shift(-1))[['A', 'B', 'E', 'E_shift']].max(axis=1)
print(out)

# Output
0    24.0010
1    24.0010
2    18.0010
3     8.0031
dtype: float64

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, January 20, 2024

[FIXED] Find max row value using .shift and .apply in pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels