Issue
I'm trying to make category prediction which basically I have this 3 columns 'First Name','Last Name','Gender' and my goal is that to predict the category of input variable 'test_x' so in below code I inserted 'Male' as my input and I was expecting for 'Gender' as my output but I got this error instead: AttributeError: 'Series' object has no attribute 'lower'
.
import pandas as pd
import nltk
class Employee_Category:
FIRST_NAME = "FIRST_NAME"
LAST_NAME = "LAST_NAME"
GENDER = "GENDER"
data = pd.read_excel("C:\\users\\HP\\Documents\\Datascience task\\Employee.xlsx")
data = data.drop(['Age','Experience (Years)','Salary'],axis='columns')
train_x = [data['First Name'],data['Last Name'],data['Gender']]
train_y = [Employee_Category.FIRST_NAME,Employee_Category.LAST_NAME,Employee_Category.GENDER]
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(binary=True)
vector = vectorizer.fit_transform(train_x)
# Train the model
clf_svm = svm.SVC(kernel='linear')
clf_svm.fit(vector,train_y)
# Predict
test_x = vectorizer.transform(['Male']) # Expected output: "GENDER"
clf_svm.predict(test_x)
This is the head of dataset:
I have done several googling but I couldn't solve the error nor do I even understand the error in the first place so please help and give explanation for why this occurred!.
Solution
The problem here is that you have to flatten your input matrix, so that every word is assigned to a label. The code below works for me:
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.feature_extraction.text import CountVectorizer
class Employee_Category:
FIRST_NAME = "FIRST_NAME"
LAST_NAME = "LAST_NAME"
GENDER = "GENDER"
data = pd.DataFrame(columns=['First Name','Last Name','Gender'])
data.loc[0,:] = ['Arnold','Carter','Male']
data.loc[1,:] = ['Arthur','Farrell','Male']
data.loc[2,:] = ['Richard','Perry','Male']
data.loc[3,:] = ['Ellia','Thomas','Female']
train_x = data.to_numpy().flatten()
train_y = len(data)*[Employee_Category.FIRST_NAME,Employee_Category.LAST_NAME,Employee_Category.GENDER]
vectorizer = CountVectorizer(binary=True)
vector = vectorizer.fit_transform(train_x)
# Train the model
clf_svm = SVC(kernel='linear')
clf_svm.fit(vector,train_y)
# Predict
test_x = vectorizer.transform(['Male']) # Expected output: "GENDER"
print(clf_svm.predict(test_x))
returns: ['GENDER']
Answered By - Gaspar Avit Ferrero
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.