Issue
My toy data df
has three layers of index: name
, name
, year
, assuming the index columns name
and name
both names and contents are duplicated, so I need to keep one only.
import pandas as pd
# create MultiIndex
index = pd.MultiIndex.from_tuples([
('name1', 'name1', '2020'),
('name1', 'name1', '2021'),
('name2', 'name2', '2020'),
('name2', 'name2', '2021'),
('name3', 'name3', '2020'),
('name3', 'name3', '2021')
], names=['name', 'name', 'year'])
df = pd.DataFrame({
'quantity': [10, 15, 20, 25, 30, 35],
'price': [100, 150, 200, 250, 300, 350]
}, index=index)
print(df)
Out:
quantity price
name name year
name1 name1 2020 10 100
2021 15 150
name2 name2 2020 20 200
2021 25 250
name3 name3 2020 30 300
2021 35 350
I tried the following code and did not succeed:
# Create a Boolean sequence, where TRUE indicates that the index is repeated
duplicates = df.index.duplicated(keep='first')
# Use Bolnes to choose those lines that are not repeated
df = df[~duplicates]
df
Out:
quantity price
name name year
name1 name1 2020 10 100
2021 15 150
name2 name2 2020 20 200
2021 25 250
name3 name3 2020 30 300
2021 35 350
If we reset_index()
then drop duplicated columns, we will get ValueError: cannot insert name, already exists
.
How to get the following results? Thanks.
quantity price
name year
name1 2020 10 100
2021 15 150
name2 2020 20 200
2021 25 250
name3 2020 30 300
2021 35 350
Solution
Just use droplevel
:
df.droplevel(0)
Output:
quantity price
name year
name1 2020 10 100
2021 15 150
name2 2020 20 200
2021 25 250
name3 2020 30 300
2021 35 350
If you don't know the ordering of names in the index, you could find the first occurrence of name
in the index:
level = df.index.names.index('name')
df.droplevel(level)
Answered By - Nick
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.