Tuesday, February 6, 2024

[FIXED] i keep getting ValueError: array length 2643 does not match index length 3281

February 06, 2024 machine-learning, pandas, python, scikit-learn No comments

Issue

Here my code:

# AnyNan values in the target column or in my dataset
training_data.dropna(inplace=True, axis=0)
testing_data.dropna(inplace=True, axis=0)

# Perform one hot encoding on HomePlanet, 

features = ['HomePlanet', 'Destination', 'CryoSleep', 'VIP' ]
X= pd.get_dummies(training_data[features]).astype(int)
y = pd.get_dummies(training_data.Transported).astype(int)
x_test = testing_data[features]

# Creating my model

X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.6, test_size=0.4, random_state=42)
rt_model = RandomForestRegressor()
rt_model.fit(X_train,y_train)
predictions = rt_model.predict(X_test)

#save the csv

output = pd.DataFrame({'PassengerId': testing_data.PassengerId, 'Transported': predictions})
output.to_csv('submission.csv', index=False)
print("Your submission was successfully saved!")

When I print the length of X , y and X_train, y_train after train-test split I get:

6606 6606
3963 3963
2643 2643

I tried reshaping X and y.

I tried performing one hot Encoding on my x_test dataframe.

I did the iloc method on my array.

The problem only comes from the last part trying to save it as a csv.

Solution

From your first two lines, I assume that you already have testing data provided. There is no need to split the training data into additional testing data.

Therefore, your predictions should run on the provided test data x_test not the splitted X_test. Note that Python is case sensitive and naming variables like this is confusing and risks mixing up variables.

As you use X_test, predictions is an array with a different length than your testing_data and you have therefore a length mismatch when you create a DataFrame from testing_data and predictions and try to save this DataFrame.

So using

predictions = rt_model.predict(x_test) # lowercase x

should work but I would change the code further and get rid of your additional split of the data as you throw away training data.

Answered By - Oskar Hofmann

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 6, 2024

[FIXED] i keep getting ValueError: array length 2643 does not match index length 3281

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels