Issue
Is there any condition which would make the pandas dataframe function rank
return a mixture of integers and floats in its return value, or are the outputs always guaranteed to be 1...N integers?
Solution
As @TomAugspurger indicates. If their are duplicates they can be non-integer. (But are of float64 dtype in any event).
In [7]: DataFrame({'A' : Series([1,2,3,4]), 'B' : Series([1,1,1,1]) }).rank()
Out[7]:
A B
0 1 2.5
1 2 2.5
2 3 2.5
3 4 2.5
[4 rows x 2 columns]
In [8]: DataFrame({'A' : Series([1,2,3,4]), 'B' : Series([1,1,1,1]) }).rank().dtypes
Out[8]:
A float64
B float64
dtype: object
Several rank options
In [12]: DataFrame({'A' : Series([1,2,3,4]), 'B' : Series([1,1,1,1]) }).rank(method='min')
Out[12]:
A B
0 1 1
1 2 1
2 3 1
3 4 1
[4 rows x 2 columns]
In [13]: DataFrame({'A' : Series([1,2,3,4]), 'B' : Series([1,1,1,1]) }).rank(method='max')
Out[13]:
A B
0 1 4
1 2 4
2 3 4
3 4 4
[4 rows x 2 columns]
In [14]: DataFrame({'A' : Series([1,2,3,4]), 'B' : Series([1,1,1,1]) }).rank(method='first')
Out[14]:
A B
0 1 1
1 2 2
2 3 3
3 4 4
[4 rows x 2 columns]
Answered By - Jeff
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.