Issue
I was creating a power demand prediction model using temperature using sklearn's polynomial regression. However, after finishing learning, when I drew a graph with matplotlib.pyplot, the following shape appeared.
I want a model with one curve. What's wrong and what should I do? Here is the full code.
import pandas as pd
dt = pd.read_csv("complete_dataset.csv")
dt.isnull().sum()
dt = dt.dropna()
dt.head()
dt = dt[["demand", "solar_exposure", "max_temperature","rainfall"]]
dt.head()
### Correlation between sun exposure and electricity demand --> weak
x = dt.iloc[:, 0].values
y = dt.iloc[:, 1].values
import matplotlib.pyplot as plt
plt.scatter(x, y, s = 2, color = "black")
plt.xlabel("demand")
plt.ylabel("solar exposure")
### Correlation between maximum temperature and electricity demand --> Demand tends to increase as it decreases or increases.
y = dt.iloc[:, 2]
plt.scatter(x, y, s = 1, color = "black")
plt.xlabel("demand")
plt.ylabel("max temperature")
### There appears to be no correlation between rainfall and electricity demand.
y = dt.iloc[:, 3].values
plt.scatter(x, y, s = 2, color = "black")
plt.xlabel("demand")
plt.ylabel("rainfall")
dt = dt[["demand", "max_temperature"]]
dt.rename(columns={'max_temperature': 'temp'}, inplace=True)
## model
x = dt["demand"].values.reshape(-1, 1)
y = dt["temp"].values
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(x)
X_poly[:5]
poly_reg.get_feature_names_out()
lin_reg = LinearRegression()
lin_reg.fit(X_poly,y)
plt.scatter(x, y, color = "black", s = 2)
plt.plot(x, lin_reg.predict(poly_reg.fit_transform(x)), color = 'red')
plt.xlabel("demand")
plt.ylabel("max temperature")
plt.show()
### Problem: The lines come out strangely because they are split sideways.
### Solution: Should we change the x-axis and y-axis to make a V-shape?
x = dt["demand"].values.reshape(-1, 1)
y = dt["temp"].values.reshape(-1, 1)
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 2)
y_poly = poly_reg.fit_transform(y)
y_poly[:5]
poly_reg.get_feature_names_out()
lin_reg = LinearRegression()
lin_reg.fit(y_poly,x)
plt.scatter(y, x, color = "black", s = 2)
plt.plot(y, lin_reg.predict(poly_reg.fit_transform(y)), color = 'red')
plt.ylabel("demand")
plt.xlabel("max temperature")
plt.show()
Solution
You see this because plot() assumes the data points to be sorted. The curve are actually connected dots, and since they are not in the expected order, matplotlib connects the dots which result in the chaos you are seeing.
You just need to sort the data points after fitting the model:
# After fitting the model
sorted_indices = y.argsort(axis=0) # This gets the indices that would sort the array
sorted_y = y[sorted_indices].ravel()
sorted_predictions = lin_reg.predict(poly_reg.fit_transform(sorted_y.reshape(-1, 1)))
# Now, plot using these sorted values
plt.scatter(y, x, color="black", s=2)
plt.plot(sorted_y, sorted_predictions, color='red')
plt.ylabel("demand")
plt.xlabel("max temperature")
plt.show()
This will display a single smooth curve as expected.
Answered By - DataJanitor
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.