Tuesday, September 13, 2022

[FIXED] How can I get percentile of column in dataframe considering only previous values? (Python)

September 13, 2022 dataframe, pandas, percentile, python No comments

Issue

I have a dataframe with a numeric column and I would link to calculate the percentile of the values in each row for that column considering only previous rows of the column. Here is an example:

+-------+
| col_1 |
+-------+
|  5    |
+-------+
|  4    |
+-------+
|  10   |
+-------+
|  1    |
+-------+
|  7    |
+-------+

I would like to obtain a dataframe like this:

+-------+------------+
| col_1 | percentile |
+-------+------------+
|  5    |     100    |
+-------+------------+
|  4    |     50     |
+-------+------------+
|  10   |     100    |
+-------+------------+
|  1    |     25     |
+-------+------------+
|  7    |     80     |
+-------+------------+

How can I calculate it?

Solution

Try as follows.

Use df.expanding with min_periods=1 to allow expanding window calculations.
For each window, we apply Expanding.rank with pct=True (and we multiply by 100).

We can assign the result directly to the new column percentile:

import pandas as pd

data = {'col_1':[5,4,10,1,7]}
df = pd.DataFrame(data)

df['percentile'] = df['col_1'].expanding(min_periods=1).rank(pct=True).mul(100)

print(df)

   col_1  percentile
0      5       100.0
1      4        50.0
2     10       100.0
3      1        25.0
4      7        80.0

Update: Expanding.rank was added to pandas in version 1.4.0. For earlier versions, you could for instance try:

temp = df['col_1'].expanding(min_periods=1).agg(['rank','count'])
df['percentile'] = (temp['rank']/temp['count']).mul(100)

print(df)

   col_1  percentile
0      5       100.0
1      4        50.0
2     10       100.0
3      1        25.0
4      7        80.0

Or, as a one-liner:

df['percentile'] = df['col_1'].expanding(min_periods=1)\
    .apply(lambda x: (x.rank()/x.count()).to_numpy()[-1]*100)

Answered By - ouroboros1

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, September 13, 2022

[FIXED] How can I get percentile of column in dataframe considering only previous values? (Python)

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels