Issue
I have a got below mentioned Dataframe extracted from an unstructured txt. file. I need it sorted as shown below mentioned example format in the exact order. Please also note that when capturing the date only the date next to(or following) TDY should be captured.
Please note this is a simplified version of the Data frame and the data continues for 3 months.
Dummy initial txt file https://drive.google.com/drive/u/0/folders/1CRzCMRo_TV2B-6ZDPA0kb7S7DGuDg7IY
Dataframe
0 | 5 | 36 | 43 | 79 | |
---|---|---|---|---|---|
1 | TDY | 01JUN2022 | 1.2 | TOM | 1.2 |
2 | SPT | 05JUN2022 | 1.4 | 1WK | 1.4 |
3 | 2WK | 01JUN2022 | 1.5 | 1MO | 1.6 |
4 | 2MO | 06JUN2022 | 1.7 | 3MO | 1.7 |
5 | 6MO | 01JUN2022 | 1.4 | 9MO | 1.8 |
6 | 1YR | 01JUN2022 | 1.7 | 2YR | 1.3 |
7 | 3YR | 01JUN2022 | 1.2 | 4YR | 1.2 |
8 | 5YR | 01JUN2022 | 1.4 | 6YR | 1.4 |
9 | TDY | 02JUN2022 | 1.2 | TOM | 1.2 |
10 | SPT | 02JUN2022 | 1.4 | 1WK | 1.4 |
11 | 2WK | 07JUN2022 | 1.5 | 1MO | 1.6 |
12 | 2MO | 02JUN2022 | 1.7 | 3MO | 1.7 |
13 | 6MO | 02JUN2022 | 1.4 | 9MO | 1.8 |
14 | 1YR | 02JUN2022 | 1.7 | 2YR | 1.3 |
15 | 3YR | 02JUN2022 | 1.2 | 4YR | 1.2 |
16 | 5YR | 07JUN2022 | 1.4 | 6YR | 1.4 |
17 | TDY | 03JUN2022 | 1.2 | TOM | 1.2 |
18 | SPT | 03JUN2022 | 1.4 | 1WK | 1.4 |
19 | 2WK | 03JUN2022 | 1.5 | 1MO | 1.6 |
20 | 2MO | 09JUN2022 | 1.7 | 3MO | 1.7 |
21 | 6MO | 03JUN2022 | 1.4 | 9MO | 1.8 |
22 | 1YR | 02JUN2022 | 1.7 | 2YR | 1.3 |
23 | 3YR | 03JUN2022 | 1.2 | 4YR | 1.2 |
24 | 5YR | 03JUN2022 | 1.4 | 6YR | 1.4 |
Output required
Date | TDY | TOM | SPT | 1WK | 2WK | 1MO | 2MO | 3MO | 6MO | 9MO | 1YR | 2YR | 3YR | 4YR | 5YR | 6YR |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01JUN2022 | 1.2 | 1.2 | 1.4 | 1.4 | 1.5 | 1.6 | 1.7 | 1.7 | 1.4 | 1.8 | 1.7 | 1.3 | 1.2 | 1.2 | 1.4 | 1.4 |
02JUN2022 | 1.2 | 1.2 | 1.4 | 1.4 | 1.5 | 1.6 | 1.7 | 1.7 | 1.4 | 1.8 | 1.7 | 1.3 | 1.2 | 1.2 | 1.4 | 1.4 |
03JUN2022 | 1.2 | 1.2 | 1.4 | 1.4 | 1.5 | 1.6 | 1.7 | 1.7 | 1.4 | 1.8 | 1.7 | 1.3 | 1.2 | 1.2 | 1.4 | 1.4 |
Solution
Not sure if you wanted to keep that date format or not?
dates_df = pd.to_datetime(df.loc[df["0"] == "TDY"]["5"]).reset_index(drop=True)
column_order = ["TDY", "TOM", "SPT", "1WK", "2WK", "1MO", "2MO", "3MO",
"6MO", "9MO", "1YR", "2YR", "3YR", "4YR", "5YR", "6YR"]
header_df = df[["0", "43"]].melt(value_name="headers")
value_df = df[["36", "79"]].melt(value_name="values")
melted_df = pd.concat([header_df, value_df], axis=1)[["headers", "values"]]
grouped_mapping = melted_df.groupby("headers")["values"].apply(list).to_dict()
grouped_df = pd.DataFrame(grouped_mapping).reindex(columns=column_order)
final_df = pd.merge(dates_df, grouped_df, left_index=True, right_index=True).rename(columns={"5": "Dates"})
print(final_df)
Dates TDY TOM SPT 1WK 2WK 1MO ... 9MO 1YR 2YR 3YR 4YR 5YR 6YR
0 2022-06-01 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
1 2022-06-02 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
2 2022-06-03 1.2 1.2 1.4 1.4 1.5 1.6 ... 1.8 1.7 1.3 1.2 1.2 1.4 1.4
Answered By - Jason Baker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.