Issue
I'm working with some xlsx files and need to import them into program. I've written a script that handles everything I need it to do.
However, I need to apply filters to the table in excel before importing them in.
When I apply filters and prep the table to import into python, python reads the entire table ignoring all the filters.
My work around has been filtering what I need then copying that to a new sheet. Then when reading into excel I specify the new sheet of filtered data that I'm looking for.
Is there a way to read the filtered table into excel directly?
Or Should I just import the entire table and apply those same filters using pandas in my script instead?
Solution
IIUC, you can't read only visible rows and/or columns of an Excel spreadsheet with pandas.
To do that, you need some help from openpyxl (!pip install openpyxl
) :
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook("file.xlsx")
ws = wb.active # or wb["SheetName"] # <- change the name here
rows = [[c.value for c in r] for r in ws.iter_rows()
if not ws.row_dimensions[r[0].row].hidden]
df = pd.DataFrame(data= rows[1:], columns=rows[0])
Output :
print(df)
col col2
0 foo 1.0
2 baz 3.0
Input used (spreadsheet) :
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.