Issue
When I'm working with Multiindex dataframes I'm dealing with an annoying issue where I drop some rows, and then want to loop through the zero-level index, but the dropped indices are still stored in df.index.levels[0]
Here's a reproducible example:
#make the multiindex df
arrays = [
["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
["one", "two", "one", "two", "one", "two", "one", "two"],
]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
df = pd.DataFrame(np.random.randn(8,2), columns=['first','second'], index=index)
#drop a row
row_dropped = df.drop('foo')
print(row_dropped.index.levels[0])
This results in: Index(['bar', 'baz', 'foo', 'qux'], dtype='object', name='first')
But I want: Index(['bar', 'baz', 'qux'], dtype='object', name='first')
Because I'm trying to do something like the following:
for zero_level_index in row_dropped.index.levels[0]:
print(row_dropped.loc[zero_level_index])
Is there a way to access the index that does not include dropped values? Or a way to iterate through just on the remaining index values at level zero?
Solution
This is done for efficiency, you need to use remove_unused_levels
:
row_dropped.index.remove_unused_levels().levels[0]
Or get_level_values
:
row_dropped.index.get_level_values(0).unique()
Output: Index(['bar', 'baz', 'qux'], dtype='object', name='first')
You can get more information on this behavior in the advanced documentation.
The MultiIndex keeps all the defined levels of an index, even if they are not actually used. When slicing an index, you may notice this. […] This is done to avoid a recomputation of the levels in order to make slicing highly performant. If you want to see only the used levels, you can use the
get_level_values()
method. […] To reconstruct the MultiIndex with only the used levels, theremove_unused_levels()
method may be used.
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.