Issue
I have an excel file with products like this below. Is it possible to align the same kind of attributes to same column using python?
I have this
category | name | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
board games | board game 1 | kind | family | box color | red | weight | 0.7 |
board games | board game 2 | kind | card game | box color | blue | lenght | 25 |
board games | board game 3 | box color | green | weight | 0.5 | lenght | 32 |
Desired output
category | name | kind | box color | weight | length |
---|---|---|---|---|---|
board games | board game1 | family games | red | 0.7 | |
board games | board game2 | card games | blue | 25 | |
board games | board game3 | green | 0.5 | 32 |
i have an other case with duplicate values.
category | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | sim type | dual | color | green | ram | 4GB | Storage | 128GB | year | 2021 |
smartphones | samsung2 | sim type | dual | color | green | ram | 4GB | storage | 256GB | year | 2021 |
smartphones | xiaomi3 | sim type | dual | color | blue | ram | 6GB | length | 32mm | Storage | 128GB |
desired output:
category | name | sim type | color | ram | Storage | length | year |
---|---|---|---|---|---|---|---|
smartphones | samsung1 | dual | green | 4GB | 128GB | nan | 2021 |
smartphones | samsung2 | dual | green | 4GB | 256GB | nan | 2021 |
smartphones | xiaomi3 | dual | blue | 6GB | 128GB | 32mm | nan |
error message:
ValueError: Index contains duplicate entries, cannot reshape
UPDATE:
Index contains duplicate entries, cannot reshape
category | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | sim type | dual | color | green | ram | 4GB | Storage | 128GB | year | 2021 | camera | yes | fingerprint | yes |
smartphones | samsung2 | sim type | dual | color | green | ram | 4GB | storage | 256GB | year | 2021 | camera | yes | fingerprint | yes |
smartphones | xiaomi3 | sim type | dual | color | blue | ram | 6GB | length | 32mm | Storage | 128GB | camera | yes | fingerprint | yes |
desired output:
category | name | sim type | color | ram | Storage | length | year | camera | fingerprint |
---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | dual | green | 4GB | 128GB | nan | 2021 | yes | yes |
smartphones | samsung2 | dual | green | 4GB | 256GB | nan | 2021 | yes | yes |
smartphones | xiaomi3 | dual | blue | 6GB | 128GB | 32mm | nan | yes | yes |
https://wetransfer.com/downloads/75be244c7037c07e0cd3e378046424e520220426124941/a2f011
Solution
Update
To solve duplicate issue, you can try:
out = df.set_index(['category', 'name'], append=True).melt(ignore_index=False)
m = out['variable'].astype(int) % 2 == 1
out = (pd.concat([out.loc[m, 'value'].rename('variable'),
out.loc[~m, 'value']], axis=1)
.set_index('variable', append=True)['value']
.unstack('variable').reset_index(['category', 'name'])
.rename_axis(columns=None))
Output:
category | name | Storage | color | length | ram | sim type | year |
---|---|---|---|---|---|---|---|
smartphones | samsung1 | 128GB | green | nan | 4GB | dual | 2021 |
smartphones | samsung2 | 256GB | green | nan | 4GB | dual | 2021 |
smartphones | xiaomi3 | 128GB | blue | 32mm | 6GB | dual | nan |
Old answer
You can mainly use stack
and unstack
to rearrange your dataframe:
# Separate odd (names) and even (values) columns
out = (pd.concat([df.iloc[:, 2::2].stack().droplevel(1),
df.iloc[:, 3::2].stack().droplevel(1)], axis=1)
.set_index(0, append=True)[1].unstack(0))
out = pd.concat([df.iloc[:, :2], out], axis=1)
Output:
category | name | box color | kind | lenght | weight |
---|---|---|---|---|---|
board games | board game 1 | red | family | nan | 0.7 |
board games | board game 2 | blue | card game | 25 | nan |
board games | board game 3 | green | nan | 32 | 0.5 |
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.