Issue
I have a data with columns
['symboling', 'Company', 'fueltype', 'aspiration', 'doornumber',
'carbody', 'drivewheel', 'enginelocation', 'carlength', 'carwidth',
'curbweight', 'enginetype', 'cylindernumber', 'enginesize',
'fuelsystem', 'horsepower', 'price', 'total_mpg']
where the goal is to predict the price of car. Now he price data is continuous. I was wondering how can I convert it so that I can use classification model.
Upon searching I did found that I can do it by defining ranges but I am unable to understand it. Kindly help me
Solution
Let's suppose that we have a dataframe with 2 continuous columns, named x1
and x2
:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
x1 = np.random.rand(100)
x2 = np.random.rand(100)
df = pd.DataFrame({"x1":x1,"x2":x2})
df.head()
# x1 x2
#0 0.049202 0.131046
#1 0.606525 0.756687
#2 0.910932 0.944692
#3 0.904655 0.439637
#4 0.565204 0.418432
# Plot values
sns.scatterplot(x=range(100),y=df["x1"])
sns.scatterplot(x=range(100),y=df["x2"])
Then we can make some buckets like this:
x1_cat = pd.cut(df['x1'], bins=[0.,0.2,0.4,0.6,0.8,np.inf], labels=[0,1,2,3,4])
x2_cat = pd.cut(df['x2'], bins=[0.,0.2,0.4,0.6,0.8,np.inf], labels=[0,1,2,3,4])
df_cat = pd.concat([x1_cat,x2_cat],axis=1)
df_cat.head()
# x1 x2
#0 0 0
#1 3 3
#2 4 4
#3 4 2
#4 2 2
# Plot values
sns.scatterplot(x=range(100),y=df_cat["x1"])
sns.scatterplot(x=range(100),y=df_cat["x2"])
Answered By - Kaveh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.