Wednesday, January 26, 2022

[FIXED] How to add Top3 predictions and Probability scores in different columns to a pandas dataframe

January 26, 2022 pandas, python, scikit-learn No comments

Issue

I have a dataframe and have used predict_proba() to get the top 3 probabilities and classes_ to get the classes as well.

I want to add these probabilities & predicted values as new columns but not sure how to go about it.

Initial data

>>> test
  Student_Id  Math  Physical  Arts
0       id_1     6         7     9
1       id_2     9         7     1
2       id_3     3         5     5

Expected data

Student_Id  Math  Physical  Arts Predicted_id_1 Predicted_id_2  \
0       id_1     6         7     9           id_1           id_2
1       id_2     9         7     1           id_2           id_1
2       id_3     3         5     5           id_3           id_2
  Predicted_id_3  Probability_Score_1  Probability_Score_2  \
0           id_3                  0.7                  0.3
1           id_3                  0.3                  0.7
2           id_1                  0.1                  0.3

   Probability_Score_3
0                  0.1
1                  0.1
2                  0.7

Sample data and code

import pandas as pd
import numpy as np

#Ceate dataframe
data = [
    ["id_1",6,7,9],
    ["id_2",9,7,1],
    ["id_3",3,5,5],
     
]

#dataframe
test = pd.DataFrame(data, columns = ['Student_Id', 'Math', 'Physical','Arts'])


#Classes from clf.Classes_
Classes = np.array(['id_1', 'id_2', 'id_3'])

#Probabilities from predict_proba()
top_test_probabilities = np.array([[0.70, 0.30, 0.10], [0.30, 0.70, 0.10], [0.10, 0.30, 0.70]])

#Indices of top 3 values sorted
best_3n = np.array([[0, 1, 2], [2, 0, 1], [2, 1, 0]])

#Not sure if the below helps

#find the associated id for each prediction
top_id = classes_[best_3n]

#cast to a new dataframe
top_id_df = pd.DataFrame(data=top_cat_nbr, columns = ["Predicted_id_1", "Predicted_id_2", "Predicted_id_3"])

#find the associated probability for each prediction
top_prob_score = top_test_probabilities [0][best_3n]

#cast to a new dataframe
top_prob_df = pd.DataFrame(data=top_prob_score, columns = ["Probability_Score_3", "Probability_Score_2", "Probability_Score_1"])

#Next I have to add the top_id_df and top_prob_score to test data

Solution

You can try and concatenate the three dataframes by columns (with axis=1): (All the indexes of the different dataframes must match (Consider performing reset_index() for each of them))

pd.concat([test ,top_id_df, top_prob_score], axis=1)

Answered By - Niv Dudovitch

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, January 26, 2022

[FIXED] How to add Top3 predictions and Probability scores in different columns to a pandas dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels