Issue
Using the Openpyxl engine for Pandas via pd.ExcelWriter
, I'd like to know if there is a way to specify a (custom) Excel duration format for elapsed time.
The format I would like to use is: [hh]:mm:ss
which should give a time like: 01:01:01
for 1 hour, 1 minute, 1 second.
I want to write from a DataFrame into this format so that Excel can recognize it when I open the spreadsheet file in the Excel application, after writing the file.
Here is my current demo code, taking a duration of two datetime.now()
timestamps:
import pandas as pd
from time import sleep
from datetime import datetime
start_time = datetime.now()
sleep(1)
end_time = datetime.now()
elapsed_time = end_time - start_time
df = pd.DataFrame([[elapsed_time]], columns=['Elapsed'])
with pd.ExcelWriter('./sheet.xlsx') as writer:
df.to_excel(writer, engine='openpyxl', index=False)
Note that in this implementation, type(elapsed_time)
is <type 'datetime.timedelta'>
.
The code will create an Excel file with approximately the value 0.0000116263657407407
in the column of "Elapsed". In Excel's time/date format, the value 1.0
equals 1 full day, so this is roughly 1 second of that 1 day.
If I under Format > Cells > Number (CMD + 1) select the Custom Category and specify the custom format [hh]:mm:ss
for the cell, I will now see:
This desired format I want to see, every time I open the file in Excel, after writing the file.
However, I have looked around for solutions, and I cannot find a way to inherently tell pd.ExcelWriter
, df.to_excel
, or Openpyxl how to format the datetime.timedelta
object in this way.
The Openpyxl documentation gives some very sparse indications:
Handling timedelta values Excel users can use number formats resembling
[h]:mm:ss
or[mm]:ss
to display time interval durations, which openpyxl considers to be equivalent to timedeltas in Python. openpyxl recognizes these number formats when reading XLSX files and returns datetime.timedelta values for the corresponding cells.When writing timedelta values from worksheet cells to file, openpyxl uses the
[h]:mm:ss
number format for these cells.
How can I accomplish my goal of writing Excel-interpretable time (durations) in the format [hh]:mm:ss
?
To achieve this, I do not require to use the current method of creating a datetime.timedelta
object via datetime.now()
. If it's possible to achieve this objective by using/converting to a datetime
object or similar and formatting it, I would like to know how.
NB: I am using Python 2 with its latest pandas version 0.24.2
(and the openpyxl version installed with pip is the latest, 2.6.4
). I hope that is not a problem as I cannot upgrade to Python 3 and later versions of pandas right now.
Solution
It was some time ago I worked on this, but the below solution worked for me in Python 2.7.18 using Pandas 0.24.2 and Openpyxl 2.6.4 from PyPi.
As stated in the question comments, later versions may solve this more elegantly (and there might furthermore be a more elegant way to do it in the old versions I use):
If writing to a new Excel file:
writer = pd.ExcelWriter(file = './sheet.xlsx', engine='openpyxl')
# Writes dataFrame to Writer Sheet, including column header
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Selects which Sheet in Writer to manipulate
sheet = writer.sheets['Sheet1']
# Formats specific cell with desired duration format
cell = 'A2'
sheet[cell].number_format = '[hh]:mm:ss'
# Writes to file on disk
writer.save()
writer.close()
If writing to an existing Excel file:
file = './sheet.xlsx'
writer = pd.ExcelWriter(file = file, engine='openpyxl')
# Loads content from existing Sheet in file
workbook = load_workbook(file)
writer.book = workbook #writer.book potentially needs to be explicitly stated like this
writer.sheets = {sheet.title: sheet for sheet in workbook.worksheets}
sheet = writer.sheets['Sheet1']
# Writes dataFrame to Writer Sheet, below the last existing row, excluding column header
df.to_excel(writer, sheet_name='Sheet1', startrow=sheet.max_row, index=False, header=False)
# Updates the row count again, and formats specific cell with desired duration format
# (the last cell in column A)
cell = 'A' + str(sheet.max_row)
sheet[cell].number_format = '[hh]:mm:ss'
# Writes to file on disk
writer.save()
writer.close()
The above code can of course easily be abstracted into one function handling writing to both new files and existing files, and extended to managing any number of different sheets or columns, as needed.
Answered By - P A N
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.