Issue
I have the following data frame:
df = pd.DataFrame({'A': [2.001, 4.001, 8.001, 0.001],
'B': [2.001, 0.001, 0.001, 0.001],
'C': [11.001, 12.001, 11.001, 8.001],
'D': [12.001, 23.001, 12.001, 8.021],
'E': [11.001, 24.001, 18.001, 8.0031]})
I can find the max value (in each row) between columns A, B, E and E (shifted by -1) using the below-mentioned method:
df["e_shifted"] = df["E"].shift(-1)
df.apply(lambda x: max(x['A'], x['B'], x['E'], x['e_shifted']),axis = 1)
But this creates a temporary column (i.e. e_shifted) in the dataframe.
How can .apply () and shift(-1) be used together without creating a temporary column?
For example, using the below code:
df.apply(lambda x: max(x['A'], x['B'], x['E'], x['E'].shift(-1)),axis = 1)
gives an error as below:
AttributeError: 'numpy.float64' object has no attribute 'shift'
As per the solution provided by @Corralien(shown below), the code handles the shifting of a single column:
out = df.assign(E_shift=df['E'].shift(-1))[['A', 'B', 'E', 'E_shift']].max(axis=1)
But can the solution provided by @Corralien be modified to handle the shifting of multiple columns?
For example:
out = df.assign(E_shift=df['E'].shift(-1), A_shift=df['A'].shift(-1))[['A', 'B', 'E', 'E_shift', 'A_Shift']].max(axis=1)
Doing so gives the following error:
KeyError: "['A_Shift'] not in index"
Solution (after @mozway pointed out the error in typo):
out = df.assign(E_shift=df['E'].shift(-1), A_shift=df['A'].shift(-1))[['A', 'B', 'E', 'E_shift', 'A_shift']].max(axis=1)
Solution
Rather than avoiding creating an extra column, you should avoid using apply
on each row of your dataframe, which is very slow. In your case, you can use assign
to create a temporary column that will not affect your original dataframe then use a vectorized max
:
out = df.assign(E_shift=df['E'].shift(-1))[['A', 'B', 'E', 'E_shift']].max(axis=1)
print(out)
# Output
0 24.0010
1 24.0010
2 18.0010
3 8.0031
dtype: float64
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.