Thursday, February 10, 2022

[FIXED] Having trouble encoding dataset array

February 10, 2022 arrays, encoding, machine-learning, python, scikit-learn No comments

Issue

Dataset: https://docs.google.com/spreadsheets/d/1jlKp7JR9Ewujv445QgT1kZpH5868fhXFFrA3ovWxS_0/edit?usp=sharing

I've been trying to deploy the ensemble method from sklearn to a small dataset I have linked above. For some reason I keep receiving this error.

ValueError: y should be a 1d array, got an array of shape (9, 56) instead.

This is the code:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from numpy import array

from sklearn import datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import LabelEncoder

cbdata = pd.read_excel("C:/Users/Andrew/cbupdated2.xlsx")

print(cbdata)
print(cbdata.describe())
df = cbdata.columns

print(df)

x = cbdata
y = cbdata.fundingstatus

xshape = x.shape
yshape = y.shape

shapes = xshape, yshape
print(shapes)

size = x.size, y.size
print(size)


###Problem ENCODING DATA
      
##Label encoder
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(x)
print(integer_encoded)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(x)
print(X_scaled)

###Problm block
ec = OneHotEncoder()


X_encoded = cbdata.apply(lambda col: ec.fit_transform(col.astype(str)), axis=0, result_type='expand')
X_encoded2 = X_encoded.shape

print(X_encoded2)

Any help and/or suggestions on getting encoder to work, so I can use the ensemble method?

Solution

LabelEncoder is meant for encoding target variables, not features. See also this post

You should use OrdinalEncoder on the categorical columns you want to transform, because I see some of your columns have floats and strings. So for example to transform company and industry :

from sklearn.preprocessing import OrdinalEncoder

Cols = ["company","industry"]

integer_encoded = OrdinalEncoder().fit_transform(x[Cols])

Answered By - StupidWolf

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, February 10, 2022

[FIXED] Having trouble encoding dataset array

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels