Issue
We have a MultiIndex DataFrame where the top-level index uses integer values. Slicing for a specific value returns all index values up to the requested value, not just the requested value. Is this a bug, or are we doing it wrong?
Example:
import numpy as np
import pandas as pd
midx = pd.MultiIndex.from_product([[1,2], ['A', 'B']])
df = pd.DataFrame(np.arange(4).reshape((len(midx), 1)), index=midx, columns=['Values'])
df.loc[(slice(1), slice(None)), :] # Slice for only top index value=1
This first slice returns just the index values = 1, as expected:
Values
1 A 0
1 B 1
But:
df.loc[(slice(2), slice(None)), :] # Slice for only top index value=2
returns index value 1 as well as value 2, like this:
Values
1 A 0
1 B 1
2 A 2
2 B 3
where we expect this:
Values
2 A 2
2 B 3
Solution
When you call slice(x)
, x
is the stop value (see the manual); so it will return everything up and including that value. In your case you can simply supply the desired index directly:
df.loc[(2, slice(None)), :]
Output:
Values
2 A 2
B 3
Note that in calls to .loc
, slice end values are inclusive; see the manual and this Q&A.
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.