Issue
The following code produces a dataframe
import pandas as pd
import tabula
page_number = "1"
pdf_url = "https://usviber.org/wp-content/uploads/2023/12/A23-OCT.pdf"
# Reads the PDF
tables = tabula.read_pdf(pdf_url, pages=page_number)
df = tables[1]
# Selects relvant columns and rows
numeric_columns = df.select_dtypes(include=["number"])
df = df.drop(numeric_columns.columns[(numeric_columns < 0).any()], axis=1)
df = df.loc[2:13, :].iloc[:, :5]
# Set the column index to the island names
df.set_index(df.columns[0], inplace=True)
# Rename columns based on year
df.columns = pd.MultiIndex.from_product(
[["St Thomas", "St. Croix"], ["2022", "2023"]], names=["Island", "Year"]
)
# Map the index to uppercase and extract the first 3 characters
df.index = df.index.map(lambda x: str(x).upper()[:3])
df.index.set_names("Month", inplace=True)
Here is the dataframe it makes
print(df)
Island St Thomas St. Croix
Year 2022 2023 2022 2023
Month
JAN 55,086 60,470 11,550 12,755
FEB 57,929 56,826 12,441 13,289
MAR 72,103 64,249 14,094 15,880
APR 67,469 56,321 12,196 13,092
MAY 60,092 49,534 13,385 16,497
JUN 67,026 56,950 14,009 15,728
JUL 66,353 61,110 13,768 16,879
AUG 50,660 42,745 10,673 12,102
SEP 24,507 25,047 6,826 6,298
OCT 34,025 34,462 10,351 9,398
NOV 44,500 NaN 9,635 NaN
DEC 58,735 NaN 12,661 NaN
What I want, is to have the island names be the row indices, and the concatenation of the month and year to be the column names, resulting in a 2 row 24 column dataset. So, row one would be St. Thomas. The first column would be JAN2022, and the value of interest would be 56086. The next would be FEB2022, and the value 57929, and so on until December of 2023. The second row would be St. Croix, with the corresponding values and time periods as above. How might I do this?
Solution
If I understood correctly, stack
and transpose
(T
), then flatten the MultiIndex columns with Index.map
:
out = df.stack().T
out.columns = out.columns.map(lambda x: f'{x[0]}{x[1]}')
# or
# out.columns = map(''.join, out.columns)
Output:
JAN2022 JAN2023 FEB2022 FEB2023 MAR2022 MAR2023 APR2022 APR2023 MAY2022 MAY2023 JUN2022 JUN2023 JUL2022 JUL2023 AUG2022 AUG2023 SEP2022 SEP2023 OCT2022 OCT2023 NOV2022 DEC2022
Island
St Thomas 55,086 60,470 57,929 56,826 72,103 64,249 67,469 56,321 60,092 49,534 67,026 56,950 66,353 61,110 50,660 42,745 24,507 25,047 34,025 34,462 44,500 58,735
St. Croix 11,550 12,755 12,441 13,289 14,094 15,880 12,196 13,092 13,385 16,497 14,009 15,728 13,768 16,879 10,673 12,102 6,826 6,298 10,351 9,398 9,635 12,661
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.