Wednesday, November 29, 2023

[FIXED] sklearn binary classifier for dataset with datetime, categorical values without preprocessing?

November 29, 2023 classification, python, scikit-learn No comments

Issue

I need to predict if signup-driver will actually start driving using some basic classifier.

city_name   signup_os   signup_channel  signup_date bgc_date    first_completed_date    did_drive
Strark      ios web     Paid             1/2/16     NaN         NaN                     no

Strark      windows     Paid             1/21/16    NaN         NaN                     no

the dataset has some date columns, what classifier from sklearn to use to train basic classifier?

it fails with datetime values. All the features are categorical or date values

from sklearn.model_selection import train_test_split
  


X = refined_df[['city_name','signup_os','signup_channel','signup_date','bgc_date']]

y = refined_df['did_drive']




from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.25, random_state=0)

models = {}

# Logistic Regression
from sklearn.linear_model import LogisticRegression
models['Logistic Regression'] = LogisticRegression()

# Support Vector Machines
from sklearn.svm import LinearSVC
models['Support Vector Machines'] = LinearSVC()

# Decision Trees
from sklearn.tree import DecisionTreeClassifier
models['Decision Trees'] = DecisionTreeClassifier()

# Random Forest
from sklearn.ensemble import RandomForestClassifier
models['Random Forest'] = RandomForestClassifier()

# Naive Bayes
from sklearn.naive_bayes import GaussianNB
models['Naive Bayes'] = GaussianNB()

# K-Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier
models['K-Nearest Neighbor'] = KNeighborsClassifier()



from sklearn.metrics import accuracy_score, precision_score, recall_score

accuracy, precision, recall = {}, {}, {}

for key in models.keys():
    
    # Fit the classifier
    models[key].fit(X_train, y_train)
    
    # Make predictions
    predictions = models[key].predict(X_test)
    
    # Calculate metrics
    accuracy[key] = accuracy_score(predictions, y_test)
    precision[key] = precision_score(predictions, y_test)
    recall[key] = recall_score(predictions, y_test)

ValueError: could not convert string to float: 'Berton'. it cant convert city name to float. how to do it?

is there decision tree that accept datetime values without any additional conversion?

Solution

You can apply one-hot encoding to convert categorical features into numerical ones. Scikit-learn provides the OneHotEncoder

from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse=False)
X_categorical = encoder.fit_transform(X[['city_name', 'signup_os', 'signup_channel']])

Regarding the date conversion, you can extract some information from the actual date or you can try a unix timestamp conversion.

X['signup_year'] = X['signup_date'].dt.year
X['signup_month'] = X['signup_date'].dt.month

Finally, rebuild the final input and split it.

X = np.concatenate((X_categorical, X[['signup_year', 'signup_month']]), axis=1)

Answered By - Marco Parola

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, November 29, 2023

[FIXED] sklearn binary classifier for dataset with datetime, categorical values without preprocessing?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels