Issue
I have an array that I want to convert to percentiles. For example, say I have a normally distributed array:
import numpy as np
import matplotlib.pyplot as plt
arr = np.random.normal(0, 1, 1000)
plt.hist(arr)
For each value in that array, I want to calculate the percentile of that value (e.g. 0 is the 50th percentile of the above distribution so 0 -> 0.5). The result should be uniformly distributed since each percentile should have equal weight.
I found np.percentile
but this function returns a value given an array and quantile and what I need is to return a quantile given an array and value.
Is there a relatively efficient way to do this?
Solution
from scipy.stats import percentileofscore
import pandas as pd
# generate example data
arr = np.random.normal(0, 1, 10)
# pre-sort array
arr_sorted = sorted(arr)
# calculate percentiles using scipy func percentileofscore on each array element
s = pd.Series(arr)
percentiles = s.apply(lambda x: percentileofscore(arr_sorted, x))
checking that the results are correct:
df = pd.DataFrame({'data': s, 'percentiles': percentiles})
df.sort_values(by='data')
data percentiles
3 -1.692881 10.0
8 -1.395427 20.0
7 -1.162031 30.0
6 -0.568550 40.0
9 0.047298 50.0
5 0.296661 60.0
0 0.534816 70.0
4 0.542267 80.0
1 0.584766 90.0
2 1.185000 100.0
Answered By - Max Power
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.