Issue
I have a huge excel table that I work in Python. A sample:
Date | id_m00 | mprice | id_m01 | mprice |
---|---|---|---|---|
01.01.2023 | aa-bb-cc | 12,05 | dd-ee-fr | 8,80 |
02.01.2023 | aa-dd-ee | 09,55 | ff-gg-gg | 7,50 |
This column pattern follows 46 times more. Like id_m02 and mrpice; id_q1 and mprice...
What I want to have is:
Date | id | mprice |
---|---|---|
01.01.2023 | aa-bb-cc | 12,05 |
02.01.2023 | aa-dd-ee | 09,55 |
01.01.2023 | dd-ee-fr | 8,80 |
02.01.2023 | ff-gg-gg | 7,50 |
Any idea how to do that in Python? I used melt function (for the first time) but couldn't do it well. It ended up with some extra columns and lots of null values.
Solution
A possible solution with lreshape
:
prices = (df.pop("mprice").pipe(lambda x:
x.set_axis(range(len(x.columns)), axis=1)))
out = (
pd.lreshape(
pd.concat([df, prices], axis=1),
{"id": df.filter(like="id_m").columns, "mprice": prices.columns})
)
NB : The code above can be simplified if the example you shared correspond to the actual table in the spreadsheet. If so, while making the initial DataFrame, pandas will make sure to de-duplicate the mprices
and will give mprices.1
, mprices.2
, .. mprices.46
:
out = (
pd.lreshape((df:=pd.read_excel("file.xlsx")), # << feel free to adjust
{"id": df.filter(like="id_m").columns,
"mprice": df.filter(like="price").columns})
)
Output :
print(out)
Date id mprice
0 01.01.2023 aa-bb-cc 12.05
1 02.01.2023 aa-dd-ee 9.55
2 01.01.2023 dd-ee-fr 8.80
3 02.01.2023 ff-gg-gg 7.50
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.