Tuesday, February 1, 2022

[FIXED] Z-score by time and group

February 01, 2022 pandas, pandas-groupby, python, scikit-learn No comments

Issue

I have a dataframe with the following data style

I am trying to calculate z-score (standardize) by each month for every company within the style column for 3 factors (F1, F2, F3) Say for 8/31/2014, I want to calculate z-score (F1, F2, F3 seperately) within the style (say for Construction Materials) for every company within that style peers for that month. Again for 8/31/2014, I want to calculate z-score within the style (say for Electronic Equipment, Instruments & Components) for every company with "Electronic Equipment, Instruments & Components" for that month. And repeat the process for every month. To recap, first start with date, then calculate z-score within each style, and then repeat for every month.

I tried first define z-score zscr=lambda x: (x-x.mean())/x.std() then groupby by date, style but did not get the desired results.

Thank you in advance

         Date  Name                                        Style   ID  \
0   8/31/2014   XYZ                          Construction Materials  ABC   
1   9/30/2014   XYZ                          Construction Materials  ABC   
2  10/31/2014   XYZ                          Construction Materials  ABC   
3  11/30/2014   XYZ                          Construction Materials  ABC   
4   8/31/2014  Acme  Electronic Equipment, Instruments & Components  KYZ   
5   9/30/2014  Acme  Electronic Equipment, Instruments & Components  KYZ   
6  10/31/2014  Acme  Electronic Equipment, Instruments & Components  KYZ   

         F1        F2        F3  
0  0.032111  0.063330  0.027733  
1  0.068824  0.158614  0.032489  
2  0.076838  0.034735  0.020062  
3  0.020903  0.154653  0.056860  
4  0.032807  1.099790  0.233216  
5 -0.014995  0.814866  0.498432  
6 -0.002233  1.954578  0.727823

Detailed Example for style construction materials with 3 names for 8/31/2014

Date    Name    Style   F1  F2  F3  Avg F1  Avg F2  Avg F3  Std F1  Std F2  Std F3  Zscore F1   Zscore F2   Zscore F3
8/31/2014   XYZ Construction Materials  ABC 0.0321  0.0633  0.0277  0.0292  0.5066  0.3623  0.0219  0.5091  0.3078  0.131514468 -0.870730766    -1.087062133
8/31/2014   ABC Construction Materials  XKSD    0.0495  0.3939  0.4258  0.0292  0.5066  0.3623  0.0219  0.5091  0.3078  0.927735574 -0.221422977    0.206304231
8/31/2014   HCAG Construction Materials TETR    0.0061  1.0626  0.6334  0.0292  0.5066  0.3623  0.0219  0.5091  0.3078  -1.059250041    1.092153743 0.880757903

Solution

I believe you're looking for groupby + transform.

names = ['F1', 'F2', 'F3']
zscore = lambda x: (x - x.mean()) / x.std()
df[names] = df.groupby([df.Date, df.Style])[names].transform(zscore)

Answered By - cs95

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, February 1, 2022

[FIXED] Z-score by time and group

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels