Tuesday, April 12, 2022

[FIXED] AttributeError: 'Series' object has no attribute 'lower' Pandas

April 12, 2022 pandas, python, scikit-learn No comments

Issue

I'm trying to make category prediction which basically I have this 3 columns 'First Name','Last Name','Gender' and my goal is that to predict the category of input variable 'test_x' so in below code I inserted 'Male' as my input and I was expecting for 'Gender' as my output but I got this error instead: AttributeError: 'Series' object has no attribute 'lower'.

import pandas as pd 
import nltk

class Employee_Category:
    FIRST_NAME = "FIRST_NAME"
    LAST_NAME = "LAST_NAME"
    GENDER = "GENDER"

data = pd.read_excel("C:\\users\\HP\\Documents\\Datascience task\\Employee.xlsx")
data = data.drop(['Age','Experience (Years)','Salary'],axis='columns')

train_x = [data['First Name'],data['Last Name'],data['Gender']]
train_y = [Employee_Category.FIRST_NAME,Employee_Category.LAST_NAME,Employee_Category.GENDER]

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(binary=True)
vector = vectorizer.fit_transform(train_x)

# Train the model
clf_svm = svm.SVC(kernel='linear')
clf_svm.fit(vector,train_y)

# Predict
test_x = vectorizer.transform(['Male']) # Expected output: "GENDER"
clf_svm.predict(test_x)

This is the head of dataset:

I have done several googling but I couldn't solve the error nor do I even understand the error in the first place so please help and give explanation for why this occurred!.

Solution

The problem here is that you have to flatten your input matrix, so that every word is assigned to a label. The code below works for me:

import pandas as pd 
import numpy as np
from sklearn.svm import SVC
from sklearn.feature_extraction.text import CountVectorizer

class Employee_Category:
    FIRST_NAME = "FIRST_NAME"
    LAST_NAME = "LAST_NAME"
    GENDER = "GENDER"

data = pd.DataFrame(columns=['First Name','Last Name','Gender'])
data.loc[0,:] = ['Arnold','Carter','Male']
data.loc[1,:] = ['Arthur','Farrell','Male']
data.loc[2,:] = ['Richard','Perry','Male']
data.loc[3,:] = ['Ellia','Thomas','Female']

train_x = data.to_numpy().flatten()
train_y = len(data)*[Employee_Category.FIRST_NAME,Employee_Category.LAST_NAME,Employee_Category.GENDER]

vectorizer = CountVectorizer(binary=True)
vector = vectorizer.fit_transform(train_x)

# Train the model
clf_svm = SVC(kernel='linear')
clf_svm.fit(vector,train_y)

# Predict
test_x = vectorizer.transform(['Male']) # Expected output: "GENDER"
print(clf_svm.predict(test_x))

returns: ['GENDER']

Answered By - Gaspar Avit Ferrero

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, April 12, 2022

[FIXED] AttributeError: 'Series' object has no attribute 'lower' Pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels