Tuesday, February 6, 2024

[FIXED] Why is XGBoost giving constant predictions?

February 06, 2024 dataframe, pandas, python, tensorflow No comments

Issue

I am working with time series data to analyze the prices from 2018 until the end of 2023. However, it seems that regardless of the portion I take for training and testing data, there comes a point where the predictions become constant.

What could be causing this issue? Are there any methods or parameters that I can adjust to improve the model's performance? I tried using the Sliding Windows technique, but encountered the same problem.

I am importing the data like this:

df['Data'] = pd.to_datetime(df['Data'], format='%d.%m.%Y')

df.set_index('Data', inplace=True)

df = df.sort_values('Data')

And separating them like this:

train = df.loc[df.index < '01-01-2023']
test = df.loc[df.index >= '01-01-2023']

The definition of XGBoost is as follows:

model_XGB = xgb.XGBRegressor(n_estimators=300)

# Fitagem do modelo
model_XGB.fit(X_train, y_train,
          eval_set=[(X_train, y_train), (X_test, y_test)],
          verbose=100)

Solution

Why do you think there is a problem with your model. What you get is completely predictable.

Let's do some stats on your numeric data:

>>> X_train.iloc[:, :3].describe()
       Valor_Londres     Valor_NY     ICCO_EUR
count    1287.000000  1287.000000  1287.000000
mean     1747.985478  2437.389534  2082.404064
std       114.495576   175.460090   168.307449
min      1379.000000  1893.670000  1574.000000
25%      1673.670000  2331.330000  1966.770000
50%      1750.670000  2451.000000  2089.550000
75%      1820.000000  2545.335000  2206.455000
max      2048.000000  2929.330000  2561.000000

>>> X_test.iloc[:, :3].describe()
       Valor_Londres     Valor_NY     ICCO_EUR
count     254.000000   254.000000   254.000000
mean     2585.255669  3282.550236  3009.645906
std       483.913608   490.309141   513.643522
min      1952.670000  2572.670000  2313.290000
25%      2139.835000  2874.670000  2543.147500
50%      2491.165000  3312.170000  2945.900000
75%      2952.165000  3617.417500  3393.330000
max      3475.000000  4263.670000  4051.170000

Maybe you can already see what's wrong? In training data, the max values are (2048, 2929, 2561) but in test data, these values are near the min! You can also check the standard deviation (3x factor). Same observation for targets. You can also see that the shape of the curves is not the same. There is not the same seasonality and the same trend.

However, the start of 2023 (January - February) appears to be correctly predicted as the values are within the range of what the model has already seen during the training phase. After that, the regression is invaluable.

Answered By - Corralien

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 6, 2024

[FIXED] Why is XGBoost giving constant predictions?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels