Issue
I have a trained sklearn randomforest multi-label classifier, in the training set, one class is always present, which means you expect the classifier to always return 1 for this class. This happens, however the classifier returns [1] instead of [0, 1]. See output bellow:
[array([[0.05, 0.95]]), array([[0.97, 0.03]]),
array([[0.95, 0.05]]), array([[1., 0.]]), array([[1., 0.]]),
array([[1., 0.]]), array([[0.65, 0.35]]), array([[1.]])]
Why is this the case, and how do I prevent this from happening? In the example, it is the result of only a single input however in my case I have a full data frame as input which I transform into class predictions. This is not possible if one of the arrays has only a single dimension: [1] instead of two dimensions [0,1] like the predictions for the other classes.
Can this be changed with a setting in sklearn?
Extra clarification why I have a training set with only positive class samples: This is part of a recommender system and sometimes a product is bought every time by every type of customer.
Solution
I solved it using a simple list comprehension check that adds a second column to the inconsistent output array. The code to do this is bellow where rfc_output is the random forest output where there are inconsistent columns present.
rfc_output = [np.c_[x, np.zeros(window_size)] if len(x[1])<2 else x for x in rfc_output ]
Answered By - Miles Bennet Dyson
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.