Issue
I am trying to rank a pandas data frame based on two columns. I can rank it based on one column, but how can to rank it based on two columns? 'SaleCount', then 'TotalRevenue'?
import pandas as pd
df = pd.DataFrame({'TotalRevenue':[300,9000,1000,750,500,2000,0,600,50,500],
'Date':['2016-12-02' for i in range(10)],
'SaleCount':[10,100,30,35,20,100,0,30,2,20],
'shops':['S3','S2','S1','S5','S4','S8','S6','S7','S9','S10']})
df['Rank'] = df.SaleCount.rank(method='dense',ascending = False).astype(int)
#df['Rank'] = df.TotalRevenue.rank(method='dense',ascending = False).astype(int)
df.sort_values(['Rank'], inplace=True)
print(df)
current output:
Date SaleCount TotalRevenue shops Rank
1 2016-12-02 100 9000 S2 1
5 2016-12-06 100 2000 S8 1
3 2016-12-04 35 750 S5 2
2 2016-12-03 30 1000 S1 3
7 2016-12-08 30 600 S7 3
9 2016-12-10 20 500 S10 4
4 2016-12-05 20 500 S4 4
0 2016-12-01 10 300 S3 5
8 2016-12-09 2 50 S9 6
6 2016-12-07 0 0 S6 7
I'm trying to generate an output like this:
Date SaleCount TotalRevenue shops Rank
1 2016-12-02 100 9000 S2 1
5 2016-12-02 100 2000 S8 2
3 2016-12-02 35 750 S5 3
2 2016-12-02 30 1000 S1 4
7 2016-12-02 30 600 S7 5
9 2016-12-02 20 500 S10 6
4 2016-12-02 20 500 S4 6
0 2016-12-02 10 300 S3 7
8 2016-12-02 2 50 S9 8
6 2016-12-02 0 0 S6 9
Solution
Another way would be to type-cast both the columns of interest to str
and combine them by concatenating them. Convert these back to numerical values so that they could be differentiated based on their magnitude.
In method=dense
, ranks of duplicated values would remain unchanged. (Here: 6)
Since you want to rank these in their descending order, specifying ascending=False
in Series.rank()
would let you achieve the desired result.
col1 = df["SaleCount"].astype(str)
col2 = df["TotalRevenue"].astype(str)
df['Rank'] = (col1+col2).astype(int).rank(method='dense', ascending=False).astype(int)
df.sort_values('Rank')
Answered By - Nickil Maveli
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.