Issue
I am trying to train and fit a classifier, and then use it to make a prediction, based on a combination of numeric data and labeled data.
I am trying to predict the price
of a vehicle, based on these prediction variables.
prediction_values = [2, 164, 'audi', 'gas', 'std', 'four', 'sedan', 'fwd', 'front', 99.8, 176.6, 66.2, 54.3, 2337, 'ohc', 'four', 109, 'mpfi', 3.19, 3.4, 10, 102, 5500, 30]
Here is my code.
import pandas as pd
import numpy as np
# Load Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier# Step1: Create data set
# Define the headers since the data does not have any
headers = ["symboling", "normalized_losses", "make", "fuel_type", "aspiration",
"num_doors", "body_style", "drive_wheels", "engine_location",
"wheel_base", "length", "width", "height", "curb_weight",
"engine_type", "num_cylinders", "engine_size", "fuel_system",
"bore", "stroke", "compression_ratio", "horsepower", "peak_rpm",
"city_mpg", "highway_mpg", "price"]
# Read in the CSV file and convert "?" to NaN
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
header=None, names=headers, na_values="?" )
df.head()
df.columns
df_fin = pd.DataFrame({col: df[col].astype('category').cat.codes for col in df}, index=df.index)
df_fin
X = df_fin[["symboling", "normalized_losses", "make", "fuel_type", "aspiration",
"num_doors", "body_style", "drive_wheels", "engine_location",
"wheel_base", "length", "width", "height", "curb_weight",
"engine_type", "num_cylinders", "engine_size", "fuel_system",
"bore", "stroke", "compression_ratio", "horsepower", "peak_rpm",
"city_mpg", "highway_mpg"]]
y = df_fin["price"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit a Decision Tree model
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy_score(y_test, y_pred)
# create a map of your columns values with the corresponding categorical values
col_dictionary = {}
for col in df:
dictionary = dict(enumerate(df[col].astype('category').cat.categories))
col_dictionary[col] = {v: k for k, v in dictionary.items()}
# then use this map to convert the array you want to predict
prediction_values = [2, 164, 'audi', 'gas', 'std', 'four', 'sedan', 'fwd', 'front', 99.8, 176.6, 66.2, 54.3, 2337, 'ohc', 'four', 109, 'mpfi', 3.19, 3.4, 10, 102, 5500, 30]
to_predict = []
for (column, value) in zip(X.columns, prediction_values):
to_predict.append(col_dictionary[column][value])
to_predict_df = pd.DataFrame([to_predict], columns=X.columns)
clf.predict([to_predict_df.iloc[0].values])
When I run the code, I get this error.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
Input In [101] in <cell line: 5>
to_predict_df = pd.DataFrame([to_predict], columns=X.columns)
File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\frame.py:570 in __init__
arrays, columns = to_arrays(data, columns, dtype=dtype)
File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\internals\construction.py:528 in to_arrays
return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\internals\construction.py:571 in _list_to_arrays
raise ValueError(e) from e
ValueError: 25 columns passed, passed data had 24 columns
Solution
There is nothing wrong with the classifier. When you run a quick check, you can see there is something wrong with the prediction_values
array. It is missing a value.
It's length is 24 and X.columns has a length of 25. This shows that the error is happening due to the length mismatch.
If you can fix the prediction_values
array, you are good to go.
Answered By - L_Jay
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.