Issue
I used sklearn.preprocessing.KBinsDiscretizer(n_bins=10, encode='ordinal')
to discretize my continuous feature.
The strategy is 'quantile'
, by defalut. But my data distribution is actually not uniformly, like 70% of rows is 0.
Then I got KBinsDiscretizer.bins_edges=[0.,0.,0.,0.,0.,0.,0.,256.,602., 1306., 18464.]
.
There're many duplicate bins. So, is there a method to drop the duplicates in KBinsDiscretizer's bins?
KBinsDiscretizer
calculates the quantile of input. If the most samples of input are zero, the 10-quantiles will have multiple zeros. The result I expected is a discretizer with unique bins. For the example I mentioned, is [0.,256.,602., 1306., 18464.]
.
Solution
That will not be possible. Set strategy='uniform'
to achieve your goal.
Answered By - Franco Piccolo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.