Issue
I'm doing one-hot encoding and using 𝜃̂ =((𝕏𝑇𝕏)^−1) * 𝕏𝑇𝕪 to estimate theta. I was getting an error because of redundancies so I decided to drop the columns that have redundancies.
This is prior to dropping columns:
This is my code for it as I try to drop the columns that have redundancies:
def one_hot_encode_revised(data):
all_columns = data.columns
records = data[all_columns].to_dict(orient='records')
encoder = DictVectorizer(sparse=False)
encoded_X = encoder.fit_transform(records)
df = pd.DataFrame(data=encoded_X, columns=encoder.feature_names_)
return df.drop(['day=Fri', 'sex=Male', 'smoker=No', 'time=Dinner'], axis =1)
one_hot_X_revised = one_hot_encode_revised(X)
Then, I use this function to estimate theta from the above equation:
def get_analytical_sol(X, y):
"""
Computes the analytical solution to our least squares problem
Parameters
-----------
X: a 2D dataframe of numeric features (one-hot encoded)
y: a 1D vector of tip amounts
Returns
-----------
The estimate for theta
"""
return np.linalg.inv(X.T * X) * (X.T * y)
to run this:
revised_analytical_thetas = get_analytical_sol(one_hot_X_revised, tips)
My error is : ValueError: Unable to coerce to DataFrame, shape must be (8, 244): given (252, 252)
For reference, tips is this:
Did I get rid of the redundancies correctly and if yes, why do I still have the error?
Thanks!
Solution
You have an error in this line return np.linalg.inv(X.T * X) * (X.T * y)
. What you want to do is a matrix multiplication. In pandas dataframes, the sign *
is not used for matrix multiplication. You need to use @
or the dot()
method of the dataframe.
Answered By - Pierre-Loic
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.