Issue
I'm trying to find highest correlations for different columns with pandas. I know can get correlation matrix with
df.corr()
I know I can get the highest correlations after that with
df.sort()
df.stack()
df[-5:]
The problem is that these correlation also contain values for column with the column itself (1). How do I remove these columns that contain correlation with self? I know I can remove them by removing all 1 values but I don't want to do that as there might be actual 1 correlations too.
Solution
I recently found even cleaner answer to my question, you can compare multi-index levels by value.
This is what I ended using.
corr = df.corr().stack()
corr = corr[corr.index.get_level_values(0) != corr.index.get_level_values(1)]
Answered By - mikkom
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.