Issue
I have a dataframe that looks like this: Each value represents a value one of 5 distances (1000m, 800m, 600m, 400m, 200m, 0).
'key1': array([ 1.21, 0.99, 6.66,
5.22, 3.33]), 'key2': array([ 2.21, 2.99, 5.66,
6.22, 2.33]), 'key3': array([ 4.21, 1.59, 6.66,
9.12, 0.23])......
I want to calculate a Spearman rank correlation between the values and the distances for each of the keys.
I have a lot of 'keys' I would like to do this somehow in pandas. And then plot a graph of spearman rank and distance averaging across all keys.
Solution
This is one way via a dictionary comprehension and scipy.stats.spearmanr
.
import numpy as np
from scipy.stats import spearmanr
d = np.array([1000, 800, 600, 400, 200])
v = {'key1': np.array([ 1.21, 0.99, 6.66, 5.22, 3.33]),
'key2': np.array([ 2.21, 2.99, 5.66, 6.22, 2.33]),
'key3': np.array([ 4.21, 1.59, 6.66, 9.12, 0.23])}
res = {k: spearmanr(v[k], d)[0] for k in sorted(v)}
If you want to use pandas
, my advice is perform your calculations as above and create a dataframe from your results.
This will almost certainly be more efficient than performing your calculations after putting data in pandas
.
df = pd.DataFrame.from_dict(res, orient='index')
Result:
0
key1 -0.5
key2 -0.4
key3 0.1
Answered By - jpp
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.