Issue
As a learning project for Python, I am attempting to read all Excel files in a directory and extract the names of all the sheets.
I have been trying several available Python modules to do this (pandas
in this example), but am running into an issue with most of them depending on openpyxl
.
This is my current code:
import os
import pandas
directory_root = 'D:\\testFiles'
# Dict to hold all files, stats
all_files = {}
for _current_path, _dirs_in_path, _files_in_path in os.walk(directory_root):
# Add all files to this `all_files`
for _file in _files_in_path:
# Extract filesystem stats from the file
_stats = os.stat(os.path.join(_current_path, _file))
# Add the full file path and its stats to the `all_files` dict.
all_files[os.path.join(_current_path, _file)] = _stats
# Loop through all found files to extract the sheet names
for _file in all_files:
# Open the workbook
xls = pandas.ExcelFile(_file)
# Loop through all sheets in the workbook
for _sheet in xls.sheet_names():
print(_sheet)
This raises an error from openpyxl
when calling pandas.ExcelFile()
: ValueError: Max value is 14
.
From what I can find online, this is because the file contains a font family above 14. How do I read from an Excel (xlsx) file while disregarding any existing formatting?
The only potential solution I could find suggests modifying the original file and removing the formatting, but this is not an option as I do not want to modify the files in any way.
Is there another way to do this that doesn't have this formatting limitation?
Solution
The issue is that your file does not conform to the Open Office specification. Only certain font families are allowed. Once openpyxl
encounters a font out of specification, it throws this error because OpenPyxl only allows spec-conforming excel files.
Some Excel readers may not have an issue with this and are more flexible with non-OpenOffice-spec-conforming files, but openpyxl only implements the Apache Open Office spec.
The xml being parsed will contain information about the font like this:
<font>
<b/>
<sz val="11"/>
<color rgb="FF000000"/>
<name val="Century Gothic"/>
<family val="34"/>
</font>
If the family value is over 14, openpyxl throws this ValueError
. There is an underlying descriptor in Open Office that controls this.
When other readers like, say, Microsoft Office 365 Excel encounters this, it will change the font family when loading the file to a compliant font (the default, Calibri).
As a workaround, if you don't want to change the value (as Microsoft Excel does), you can monkeypatch the descriptor to allow a larger max font family.
# IMPORTANT, you must do this before importing openpyxl
from unittest import mock
# Set max font family value to 100
p = mock.patch('openpyxl.styles.fonts.Font.family.max', new=100)
p.start()
import openpyxl
openpyxl.open('my-bugged-worksheet.xlsx') # this works now!
This can be reproduced using this excel workbook. Before the patch this will fail to load. After the patch, it loads without error.
Answered By - sytech
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.