Issue
I have a dataframe with the following data style
I am trying to calculate z-score (standardize) by each month for every company within the style column for 3 factors (F1, F2, F3) Say for 8/31/2014, I want to calculate z-score (F1, F2, F3 seperately) within the style (say for Construction Materials) for every company within that style peers for that month. Again for 8/31/2014, I want to calculate z-score within the style (say for Electronic Equipment, Instruments & Components) for every company with "Electronic Equipment, Instruments & Components" for that month. And repeat the process for every month. To recap, first start with date, then calculate z-score within each style, and then repeat for every month.
I tried first define z-score zscr=lambda x: (x-x.mean())/x.std() then groupby by date, style but did not get the desired results.
Thank you in advance
Date Name Style ID \
0 8/31/2014 XYZ Construction Materials ABC
1 9/30/2014 XYZ Construction Materials ABC
2 10/31/2014 XYZ Construction Materials ABC
3 11/30/2014 XYZ Construction Materials ABC
4 8/31/2014 Acme Electronic Equipment, Instruments & Components KYZ
5 9/30/2014 Acme Electronic Equipment, Instruments & Components KYZ
6 10/31/2014 Acme Electronic Equipment, Instruments & Components KYZ
F1 F2 F3
0 0.032111 0.063330 0.027733
1 0.068824 0.158614 0.032489
2 0.076838 0.034735 0.020062
3 0.020903 0.154653 0.056860
4 0.032807 1.099790 0.233216
5 -0.014995 0.814866 0.498432
6 -0.002233 1.954578 0.727823
Detailed Example for style construction materials with 3 names for 8/31/2014
Date Name Style F1 F2 F3 Avg F1 Avg F2 Avg F3 Std F1 Std F2 Std F3 Zscore F1 Zscore F2 Zscore F3
8/31/2014 XYZ Construction Materials ABC 0.0321 0.0633 0.0277 0.0292 0.5066 0.3623 0.0219 0.5091 0.3078 0.131514468 -0.870730766 -1.087062133
8/31/2014 ABC Construction Materials XKSD 0.0495 0.3939 0.4258 0.0292 0.5066 0.3623 0.0219 0.5091 0.3078 0.927735574 -0.221422977 0.206304231
8/31/2014 HCAG Construction Materials TETR 0.0061 1.0626 0.6334 0.0292 0.5066 0.3623 0.0219 0.5091 0.3078 -1.059250041 1.092153743 0.880757903
Solution
I believe you're looking for groupby
+ transform
.
names = ['F1', 'F2', 'F3']
zscore = lambda x: (x - x.mean()) / x.std()
df[names] = df.groupby([df.Date, df.Style])[names].transform(zscore)
Answered By - cs95
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.