Issue
I require equal width discretization of the attributes to the dataset and then continue with plotting values dataset discretized vs continous. for that I need complete discrete valued dataset and not a sparse matrix
X,y = datasets[0]
enc = KBinsDiscretizer(n_bins=5 )
X_binned = enc.fit_transform(X)
print(pd.DataFrame.sparse.from_spmatrix(X_binned).shape)
print(X.shape)
Output:
(100, 10)
(100, 2)
Thank you
Solution
The issue is that KBinsDiscretizer
default encoding method is onehot
, meaning that the transformed result will be the OneHot encoded columns obtained from each feature.
You can set the encoding to ordinal
so that each bin is encoded as an integer value, and hence the shape is preserved:
enc = KBinsDiscretizer(n_bins=5, encode='ordinal')
X_binned = enc.fit_transform(x)
Answered By - yatu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.