Issue
I have a dataframe and have used predict_proba() to get the top 3 probabilities and classes_ to get the classes as well.
I want to add these probabilities & predicted values as new columns but not sure how to go about it.
Initial data
>>> test
Student_Id Math Physical Arts
0 id_1 6 7 9
1 id_2 9 7 1
2 id_3 3 5 5
Expected data
Student_Id Math Physical Arts Predicted_id_1 Predicted_id_2 \
0 id_1 6 7 9 id_1 id_2
1 id_2 9 7 1 id_2 id_1
2 id_3 3 5 5 id_3 id_2
Predicted_id_3 Probability_Score_1 Probability_Score_2 \
0 id_3 0.7 0.3
1 id_3 0.3 0.7
2 id_1 0.1 0.3
Probability_Score_3
0 0.1
1 0.1
2 0.7
Sample data and code
import pandas as pd
import numpy as np
#Ceate dataframe
data = [
["id_1",6,7,9],
["id_2",9,7,1],
["id_3",3,5,5],
]
#dataframe
test = pd.DataFrame(data, columns = ['Student_Id', 'Math', 'Physical','Arts'])
#Classes from clf.Classes_
Classes = np.array(['id_1', 'id_2', 'id_3'])
#Probabilities from predict_proba()
top_test_probabilities = np.array([[0.70, 0.30, 0.10], [0.30, 0.70, 0.10], [0.10, 0.30, 0.70]])
#Indices of top 3 values sorted
best_3n = np.array([[0, 1, 2], [2, 0, 1], [2, 1, 0]])
#Not sure if the below helps
#find the associated id for each prediction
top_id = classes_[best_3n]
#cast to a new dataframe
top_id_df = pd.DataFrame(data=top_cat_nbr, columns = ["Predicted_id_1", "Predicted_id_2", "Predicted_id_3"])
#find the associated probability for each prediction
top_prob_score = top_test_probabilities [0][best_3n]
#cast to a new dataframe
top_prob_df = pd.DataFrame(data=top_prob_score, columns = ["Probability_Score_3", "Probability_Score_2", "Probability_Score_1"])
#Next I have to add the top_id_df and top_prob_score to test data
Solution
You can try and concatenate the three dataframes by columns (with axis=1): (All the indexes of the different dataframes must match (Consider performing reset_index() for each of them))
pd.concat([test ,top_id_df, top_prob_score], axis=1)
Answered By - Niv Dudovitch
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.