Issue
I need to predict if signup-driver will actually start driving using some basic classifier.
city_name signup_os signup_channel signup_date bgc_date first_completed_date did_drive
Strark ios web Paid 1/2/16 NaN NaN no
Strark windows Paid 1/21/16 NaN NaN no
the dataset has some date columns, what classifier from sklearn to use to train basic classifier?
it fails with datetime values. All the features are categorical or date values
from sklearn.model_selection import train_test_split
X = refined_df[['city_name','signup_os','signup_channel','signup_date','bgc_date']]
y = refined_df['did_drive']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.25, random_state=0)
models = {}
# Logistic Regression
from sklearn.linear_model import LogisticRegression
models['Logistic Regression'] = LogisticRegression()
# Support Vector Machines
from sklearn.svm import LinearSVC
models['Support Vector Machines'] = LinearSVC()
# Decision Trees
from sklearn.tree import DecisionTreeClassifier
models['Decision Trees'] = DecisionTreeClassifier()
# Random Forest
from sklearn.ensemble import RandomForestClassifier
models['Random Forest'] = RandomForestClassifier()
# Naive Bayes
from sklearn.naive_bayes import GaussianNB
models['Naive Bayes'] = GaussianNB()
# K-Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier
models['K-Nearest Neighbor'] = KNeighborsClassifier()
from sklearn.metrics import accuracy_score, precision_score, recall_score
accuracy, precision, recall = {}, {}, {}
for key in models.keys():
# Fit the classifier
models[key].fit(X_train, y_train)
# Make predictions
predictions = models[key].predict(X_test)
# Calculate metrics
accuracy[key] = accuracy_score(predictions, y_test)
precision[key] = precision_score(predictions, y_test)
recall[key] = recall_score(predictions, y_test)
ValueError: could not convert string to float: 'Berton'. it cant convert city name to float. how to do it?
is there decision tree that accept datetime values without any additional conversion?
Solution
You can apply one-hot encoding to convert categorical features into numerical ones. Scikit-learn provides the OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse=False)
X_categorical = encoder.fit_transform(X[['city_name', 'signup_os', 'signup_channel']])
Regarding the date conversion, you can extract some information from the actual date or you can try a unix timestamp conversion.
X['signup_year'] = X['signup_date'].dt.year
X['signup_month'] = X['signup_date'].dt.month
Finally, rebuild the final input and split it.
X = np.concatenate((X_categorical, X[['signup_year', 'signup_month']]), axis=1)
Answered By - Marco Parola
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.