Sunday, December 10, 2023

[FIXED] Use LogisticRegression to predict precipitation in sklearn

December 10, 2023 classification, logistic-regression, regression, scikit-learn No comments

Issue

I have a dataset with some parameters about the weather. All features are numerical and also target is continuous. I want to predict the amount of precipitation. My features look like this (Year, Month, and Day is just multi-index for my DataFrame):

_, _, X, y = read_daily_data()
print(X)

                   MEANT         RH         WS          WD        CCT         MSLP       MAXT      MINT
Year Month Day                                                                                         
2014 1     1    4.494412  90.203694  16.615975  166.495278  59.916667  1014.029167   8.720245  0.310245
           2    5.978995  92.044333  20.621631  184.099628  63.875000  1008.670833   9.240245  3.530245
           3    6.586079  88.778159  22.263927  183.268500  50.108334  1013.070833  10.400246  2.340245
           4    6.358579  94.172092  15.272616  158.277724  66.666667  1007.625000   8.480246  4.600245
           5    4.995662  86.622807  16.897822  225.090521  59.383333  1010.754167   7.480245  0.440245
...                  ...        ...        ...         ...        ...          ...        ...       ...
2023 11    8    7.268995  82.063136  17.965620  202.643657  33.016667  1019.379167  12.380245  3.760245
           9    7.729829  82.235617  25.143419  196.132513  69.020834  1010.795833  10.380245  3.690246
           10   9.101078  76.940065  27.342357  228.518643  61.875000  1005.745833  10.670245  7.960245
           11   7.350245  82.186650  22.030794  242.243293  49.875000  1010.391667   8.660245  4.260245
           12   5.818162  93.582846  18.648649  181.010854  85.333333  1010.112500  11.230246  2.140245

[3603 rows x 8 columns]

And also this is my target:

print(y)

Year  Month  Day
2014  1      1       1.4
             2       6.8
             3       0.8
             4      16.5
             5       5.5
                    ... 
2023  11     8       0.0
             9       4.2
             10      9.3
             11      3.2
             12     14.0
Name: PT, Length: 3603, dtype: float64

I apply linear regression to my dataset:

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.3, random_state=42)
std = StandardScaler()
std.fit(X)
X_train_std = std.transform(X_train)
sgd_reg = SGDRegressor(random_state=42)
sgd_reg.fit(X_train_std, y_train)
X_test_std = std.transform(X_test)
sgd_score = sgd_reg.score(X_test_std, y_test)
print(f"{sgd_score:.3f}")

Which gives me this score:

0.385

Now, when I want to apply Logistic Regression:

lgs_reg = LogisticRegression(random_state=42)
lgs_reg.fit(X_train_std, y_train)

I get this error:

ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

I know that models built for Classification like LogisticRegression predict by returning a continuous vaule which then we use threshold to quantize it. If I implement Logistic Regression on my way (for example use sigmoid functin), I definitely can input target value as a continuous number. My question is why scikit-learn don't accept this?

And also suggest a way in my specific problem to use Classification models like Logistic Regression. One thing that I understand is use Discretization to sparse the continuous target into intervals. But after that is the score from this model comparable with Linear Regression?

I appreciate your help.

Solution

Your problem is really a regression problem and not a classification problem. Although LogisticRegression might seem like a regression algorithm from the name, but it's really a classification algorithm (I know, confusing).

You should therefore use any of the regression algorithms available in scikit-learn. You have a list of all models here and you can choose any suitable for regression. Linear models are good to start with, and HistGradientBoostingRegressor is usually a good contender.

Answered By - adrin

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, December 10, 2023

[FIXED] Use LogisticRegression to predict precipitation in sklearn

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels