Issue
on the forum exists a lot of examples, how is it possible to find the max value of the row with the corresponding column names. Some examples are here or here
What I want to do, is some specific modification of the above examples. My dataframe looks like this, where all columns are numerated from left to the right (this order is very important):
x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10
0 0 1 2 2 0 0 0 0 0
4 4 0 4 4 1 0 0 0 0
0 0 1 2 3 0 0 0 0 0
Now, I want to create 6 new columns at the end of every row with the column name and the biggest values in the row.
x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
0 0 1 2 2 0 0 0 0 0
4 4 0 4 4 1 0 0 0 0
0 0 1 2 3 0 0 0 0 0
If some row has more then only 1 max (for example the value 2 in the first row), I want to save in the column Max1 only one column name with the smallest index. In this case the second biggest value is also 2, but the corresponding column has greater index. It means, It is necessary to save in the "Max(y)" column only one column name. This is the main condition. In the case, if some row has more the 3 max values, it is necessary to save only 3 column names with the smallest indices. So the final output should be look like this DF:
x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
0 0 1 2 2 0 0 0 0 0 x_4 2 x_5 2 x_3 1
4 4 0 4 4 1 0 0 0 0 x_1 4 x_2 4 x_4 4
0 0 1 2 3 0 0 0 0 0 x_5 3 x_4 2 x_3 1
So summarized we have the next result: in the first row 4 < 5, it means 4 comes first (anyway the second 2 comes immediately in the next column). in the second row 1 < 2 < 4 < 5, we have only 3 columns, so 5 is missing in the final result. in the third row, indices don't play any role, because we have strictly different values in the row. This is also the main condition.
Solution
With the following code chunk it first creates a copy of the dataframe df_copy
where the column names are replaced with their corresponding numeric index (as the order is important as you mentioned). Then it applies a function to each row to get the indices of the top 3 max values. These indices are then mapped back to the original column names. Finally, it gets the values for these columns, and of course the columns are reordered as intended.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'x_1': [0, 4, 0],
'x_2': [0, 4, 0],
'x_3': [1, 0, 1],
'x_4': [2, 4, 2],
'x_5': [2, 4, 3],
'x_6': [0, 1, 0],
'x_7': [0, 0, 0],
'x_8': [0, 0, 0],
'x_9': [0, 0, 0],
'x_10': [0, 0, 0]
})
# Create a copy of the dataframe and replace column names with their corresponding numeric index
df_copy = df.copy()
df_copy.columns = np.arange(len(df.columns))
# Apply a function to each row (axis=1) to get the indices of the top 3 max values
df[['Max1', 'Max2', 'Max3']] = df_copy.apply(lambda row: row.nlargest(3).index, axis=1, result_type='expand')
# Map the numeric indices back to column names
df[['Max1', 'Max2', 'Max3']] = df[['Max1', 'Max2', 'Max3']].applymap(lambda x: df.columns[int(x)])
# Get the values
df[['ValMax1', 'ValMax2', 'ValMax3']] = df.apply(lambda row: [row[row['Max1']], row[row['Max2']], row[row['Max3']]], axis=1, result_type='expand')
# Reorder the columns
column_order = ['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'x_6', 'x_7', 'x_8', 'x_9', 'x_10', 'Max1', 'ValMax1', 'Max2', 'ValMax2', 'Max3', 'ValMax3']
df = df[column_order]
df
Result (as intended):
x_1 x_2 x_3 x_4 x_5 x_6 x_7 x_8 x_9 x_10 Max1 ValMax1 Max2 ValMax2 Max3 ValMax3
0 0 1 2 2 0 0 0 0 0 x_4 2 x_5 2 x_3 1
4 4 0 4 4 1 0 0 0 0 x_1 4 x_2 4 x_4 4
0 0 1 2 3 0 0 0 0 0 x_5 3 x_4 2 x_3 1
Answered By - canaytore
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.