Issue
finals_preds= pd.concat([clf_preds,clf_pred_probs,ISFOR_clus_preds,SVM_clus_preds,KMEANS_clus_preds,LOCOUT_clus_preds, DBSC_clus_preds],axis=1)
finals_preds.columns=['clf_class','clf_score', 'ISOFOR','SVM-1C','KMEANS','LOCOUT','DBSCAN']
finals_preds
Then this is the output
Then the real problem comes, when I tried to add another column to summarize the modes of the series, the error says I tried to jam 2 columns into 1.
# add a column for all the scrores
finals_preds['ENSEMB']= finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)
finals_preds
Error message:
ValueError: Wrong number of items passed 2, placement implies 1
Then I checked the right side of the code, which confused me:
I also printed out the result of each series' modes, they all look normal like this:
So why is there an extra column when I tried to do the modes from them together?
Solution
mode
returns the values that appears most often. You have a binary table so you can have this three cases below:
0 1
0 0.0 NaN # You have more 0 than 1 in the first row
1 1.0 NaN # You have more 1 than 0 in the second row
2 0.0 1.0 # You have as many 0 as 1 in the third row
Unless there is no equality between the number of 0's and 1's for each row in the whole dataframe, the output will always have 2 columns.
If you want the most representative value for each row, do:
finals_preds['ENSEMB']= \
finals_preds[['ISOFOR','SVM-1C','KMEANS','LOCOUT']].mode(axis=1)[0]
# HERE ---^^^
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.