Saturday, November 12, 2022

[FIXED] The consistency ratio of the upward, downward, and equal directions of y_true and y_pred

November 12, 2022 dataframe, numpy, pandas, python-3.x No comments

Issue

Assuming we have a dataframe df:

          date  y_true  y_pred1  y_pred2
0    2017-1-31    6.42    -2.35    15.57
1    2017-2-28   -2.35    15.57     6.64
2    2017-3-31   15.57     6.64     7.61
3    2017-4-30    6.64     7.61    10.28
4    2017-5-31    7.61     7.61     6.34
5    2017-6-30   10.28     6.34     4.88
6    2017-7-31    6.34     4.88     7.91
7    2017-8-31    6.34     7.91     6.26
8    2017-9-30    7.91     6.26    11.51
9   2017-10-31    6.26    11.51    10.73
10  2017-11-30   11.51    10.73    10.65
11  2017-12-31   10.73    10.65    32.05

I want to calculate the ratio of the upward, downward, and equal consistency of two consecutive months of data in two columns, and use it as an evaluation metric of the time series forecast results. The direction of the current month to previous month ratio: up means the current month value minus the previous month value is positive, similarly, down and equal means negative and 0, respectively.

I calculated the results for the sample data using the following function and code, note that we do not include the yellow rows in the calculation of the final ratio, because the y_true_dir for these rows is either null or 0:

def cal_arrays_direction(value):
    if value > 0:
        return 1
    elif value < 0:
        return -1
    elif value == 0:
        return 0
    else:
        return np.NaN
    
df['y_true_diff'] = df['y_true'].diff(1).map(cal_arrays_direction)
df['y_pred1_diff'] = df['y_pred1'].diff(1).map(cal_arrays_direction)
df['y_pred2_diff'] = df['y_pred2'].diff(1).map(cal_arrays_direction)

df['y_true_y_pred1'] = np.where((df['y_true_diff'] == df['y_pred1_diff']), 1, 0)
df['y_true_y_pred2'] = np.where((df['y_true_diff'] == df['y_pred2_diff']), 1, 0)

dir_acc_y_true_pred1 = df['y_true_y_pred1'].value_counts()[1] / (df['y_true_diff'].value_counts()[-1] 
+ df['y_true_diff'].value_counts()[1])
print(dir_acc_y_true_pred1)

dir_acc_y_true_pred2 = df['y_true_y_pred2'].value_counts()[1] / (df['y_true_diff'].value_counts()[-1] 
 + df['y_true_diff'].value_counts()[1])
print(dir_acc_y_true_pred2)

Out:

0.2
0.4

But I wonder how could I convert it into a function (similar to MSE, RMSE, etc. in sklearn) to make it's easier to use, thanks!

def direction_consistency_acc(y_true, y_pred):
     ...
     return dir_acc_ratio

Update 1:

Traceback (most recent call last):
  File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3803, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1577, in pandas._libs.hashtable.Float64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1587, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "..\code\stacked model_2022-11-08.py", line 353, in <module>
    run_model(df)
  File "..\code\stacked model_2022-11-08.py", line 258, in run_model
    out1 = direction_consistency_acc(preds['y_true'], preds[['y_pred1','y_pred2',
  File "..\code\stacked model_2022-11-08.py", line 245, in direction_consistency_acc
    dir_acc_y_true_pred = preds[f'y_true_{col}'].eq(1).sum() / (s[-1] + s[1])
  File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 981, in __getitem__
    return self._get_value(key)
  File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 1089, in _get_value
    loc = self.index.get_loc(label)
  File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
    raise KeyError(key) from err
KeyError: 1

Process finished with exit code 1

Update 2:

I print(df['y_true_diff'].value_counts()) while runing direction_consistency_acc(df['y_true'], df[['y_pred1','y_pred2']]):

...
2021-05-31
-1.0    4
 1.0    2
Name: y_true_diff, dtype: int64
2021-06-30
-1.0    5
 1.0    1
Name: y_true_diff, dtype: int64
2021-07-31
Traceback (most recent call last):
  File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3803, in get_loc
-1.0    6
Name: y_true_diff, dtype: int64
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1577, in pandas._libs.hashtable.Float64HashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1587, in pandas._libs.hashtable.Float64HashTable.get_item
KeyError: 1.0

The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "..\code\stacked model_2022-11-08.py", line 353, in <module>
        run_model(df)
      File "..\code\stacked model_2022-11-08.py", line 258, in run_model
        out1 = direction_consistency_acc(preds['y_true'], preds[['y_pred1','y_pred2',
      File "..\code\stacked model_2022-11-08.py", line 245, in direction_consistency_acc
        dir_acc_y_true_pred = preds[f'y_true_{col}'].eq(1).sum() / (s[-1] + s[1])
      File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 981, in __getitem__
        return self._get_value(key)
      File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\series.py", line 1089, in _get_value
        loc = self.index.get_loc(label)
      File "C:\Users\LSTM\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
        raise KeyError(key) from err
    KeyError: 1

Solution

You can create custom function, instead custom function use numpy.sign and instead .value_counts()[1] compare by 1 and count Trues by sum:

#y_true - Series, y_pred - DataFrame
def direction_consistency_acc(y_true, y_pred):
    df['y_true_diff'] = np.sign(y_true.diff(1))
    s = df['y_true_diff'].value_counts()    

    out = []    
    for col in y_pred.columns:
        df[f'y_{col}_diff'] = np.sign(df[col].diff(1))
        df[f'y_true_{col}'] = np.where((df['y_true_diff'] == df[f'y_{col}_diff']), 1, 0)
        dir_acc_y_true_pred = df[f'y_true_{col}'].eq(1).sum() / (s[-1] + s[1])
        out.append(dir_acc_y_true_pred)
        
    return out
        
out = direction_consistency_acc(df['y_true'], df[['y_pred1','y_pred2']])
print(out)
[0.2, 0.4]

Alternative without new columns:

#y_true - Series, y_pred - DataFrame
def direction_consistency_acc(y_true, y_pred):
    y_true_diff = np.sign(y_true.diff(1))
    s = y_true_diff.value_counts()    

    out = []    
    for col in y_pred.columns:
        y_true = y_true_diff == np.sign(df[col].diff(1))
        dir_acc_y_true_pred = y_true.eq(1).sum() / (s[-1] + s[1])
        out.append(dir_acc_y_true_pred)
        
    return out
        
out = direction_consistency_acc(df['y_true'], df[['y_pred1','y_pred2']])
print(out)
[0.2, 0.4]

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 12, 2022

[FIXED] The consistency ratio of the upward, downward, and equal directions of y_true and y_pred

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels