Issue
I have a data set like this :
Entity Year Mean
0 Afghanistan 2016 0.99
1 Africa 2016 0.99
2 Albania 2016 0.99
3 Algeria 2016 0.99
4 Americas 2016 0.99
... ... ... ...
11346 World 1961 0.05
11347 Yemen 1961 0.05
11348 Yugoslavia 1961 0.05
11349 Zambia 1961 0.05
11350 Zimbabwe 1961 0.05
And i need to encode Entity column in this data set. I used OneHotEncoder in sklearn. Here is my code:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
x_yam = np.array(ct.fit_transform(x_yam))
But after encoding it gives me something like this:
(0, 0) 1.0
(0, 229) 2016.0
(0, 230) 0.99
(1, 1) 1.0
(1, 229) 2016.0
(1, 230) 0.99
(2, 2) 1.0
(2, 229) 2016.0
(2, 230) 0.99
(3, 3) 1.0
(3, 229) 2016.0
(3, 230) 0.99
(4, 4) 1.0
(4, 229) 2016.0
(4, 230) 0.99
(5, 5) 1.0
(5, 229) 2016.0
(5, 230) 0.99
(6, 6) 1.0
(6, 229) 2016.0
(6, 230) 0.99
(7, 7) 1.0
(7, 229) 2016.0
(7, 230) 0.99
(8, 8) 1.0
: :
I can't use this data for my ML model so how can i use OneCodeEncoder corretly to encode my data?
Solution
The column transformer has opted to transform into a scipy sparse matrix because the one-hot encoder does and it has sufficiently many columns compared to the passthrough.
You can force dense arrays throughout by specifying sparse_threshold=0.0
in the ColumnTransformer
, or sparse=False
in the OneHotEncoder
. Or you can cast the sparse output to dense after transforming; you cannot do that with the np.array(...)
you've tried, but using .todense()
instead will work (see https://stackoverflow.com/a/55639087/10495893).
Answered By - Ben Reiniger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.