Issue
Dataset: https://docs.google.com/spreadsheets/d/1jlKp7JR9Ewujv445QgT1kZpH5868fhXFFrA3ovWxS_0/edit?usp=sharing
I've been trying to deploy the ensemble method from sklearn to a small dataset I have linked above. For some reason I keep receiving this error.
ValueError: y should be a 1d array, got an array of shape (9, 56) instead.
This is the code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from numpy import array
from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder
cbdata = pd.read_excel("C:/Users/Andrew/cbupdated2.xlsx")
print(cbdata)
print(cbdata.describe())
df = cbdata.columns
print(df)
x = cbdata
y = cbdata.fundingstatus
xshape = x.shape
yshape = y.shape
shapes = xshape, yshape
print(shapes)
size = x.size, y.size
print(size)
###Problem ENCODING DATA
##Label encoder
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(x)
print(integer_encoded)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)
###Problm block
ec = OneHotEncoder()
X_encoded = cbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded2 = X_encoded.shape
print(X_encoded2)
Any help and/or suggestions on getting encoder to work, so I can use the ensemble method?
Solution
LabelEncoder
is meant for encoding target variables, not features. See also this post
You should use OrdinalEncoder
on the categorical columns you want to transform, because I see some of your columns have floats and strings. So for example to transform company
and industry
:
from sklearn.preprocessing import OrdinalEncoder
Cols = ["company","industry"]
integer_encoded = OrdinalEncoder().fit_transform(x[Cols])
Answered By - StupidWolf
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.