Friday, July 29, 2022

[FIXED] Pandas: How to write custom time duration format to Excel file with pd.ExcelWriter via Openpyxl

July 29, 2022 excel, openpyxl, pandas, python, timedelta No comments

Issue

Using the Openpyxl engine for Pandas via pd.ExcelWriter, I'd like to know if there is a way to specify a (custom) Excel duration format for elapsed time.

The format I would like to use is: [hh]:mm:ss which should give a time like: 01:01:01 for 1 hour, 1 minute, 1 second.

I want to write from a DataFrame into this format so that Excel can recognize it when I open the spreadsheet file in the Excel application, after writing the file.

Here is my current demo code, taking a duration of two datetime.now() timestamps:

import pandas as pd
from time import sleep
from datetime import datetime

start_time = datetime.now()

sleep(1)

end_time = datetime.now()

elapsed_time = end_time - start_time

df = pd.DataFrame([[elapsed_time]], columns=['Elapsed'])

with pd.ExcelWriter('./sheet.xlsx') as writer:
    df.to_excel(writer, engine='openpyxl', index=False)

Note that in this implementation, type(elapsed_time) is <type 'datetime.timedelta'>.

The code will create an Excel file with approximately the value 0.0000116263657407407 in the column of "Elapsed". In Excel's time/date format, the value 1.0 equals 1 full day, so this is roughly 1 second of that 1 day.

If I under Format > Cells > Number (CMD + 1) select the Custom Category and specify the custom format [hh]:mm:ss for the cell, I will now see:

This desired format I want to see, every time I open the file in Excel, after writing the file.

However, I have looked around for solutions, and I cannot find a way to inherently tell pd.ExcelWriter, df.to_excel, or Openpyxl how to format the datetime.timedelta object in this way.

The Openpyxl documentation gives some very sparse indications:

Handling timedelta values Excel users can use number formats resembling [h]:mm:ss or [mm]:ss to display time interval durations, which openpyxl considers to be equivalent to timedeltas in Python. openpyxl recognizes these number formats when reading XLSX files and returns datetime.timedelta values for the corresponding cells.

When writing timedelta values from worksheet cells to file, openpyxl uses the [h]:mm:ss number format for these cells.

How can I accomplish my goal of writing Excel-interpretable time (durations) in the format [hh]:mm:ss?

To achieve this, I do not require to use the current method of creating a datetime.timedelta object via datetime.now(). If it's possible to achieve this objective by using/converting to a datetime object or similar and formatting it, I would like to know how.

NB: I am using Python 2 with its latest pandas version 0.24.2 (and the openpyxl version installed with pip is the latest, 2.6.4). I hope that is not a problem as I cannot upgrade to Python 3 and later versions of pandas right now.

Solution

It was some time ago I worked on this, but the below solution worked for me in Python 2.7.18 using Pandas 0.24.2 and Openpyxl 2.6.4 from PyPi.

As stated in the question comments, later versions may solve this more elegantly (and there might furthermore be a more elegant way to do it in the old versions I use):

If writing to a new Excel file:

writer = pd.ExcelWriter(file = './sheet.xlsx', engine='openpyxl')


# Writes dataFrame to Writer Sheet, including column header

df.to_excel(writer, sheet_name='Sheet1', index=False)


# Selects which Sheet in Writer to manipulate

sheet = writer.sheets['Sheet1']


# Formats specific cell with desired duration format

cell = 'A2'
sheet[cell].number_format = '[hh]:mm:ss'


# Writes to file on disk

writer.save()
writer.close()

If writing to an existing Excel file:

file = './sheet.xlsx'

writer = pd.ExcelWriter(file = file, engine='openpyxl')


# Loads content from existing Sheet in file

workbook = load_workbook(file)
writer.book = workbook  #writer.book potentially needs to be explicitly stated like this
writer.sheets = {sheet.title: sheet for sheet in workbook.worksheets}

sheet = writer.sheets['Sheet1']


# Writes dataFrame to Writer Sheet, below the last existing row, excluding column header

df.to_excel(writer, sheet_name='Sheet1', startrow=sheet.max_row, index=False, header=False)


# Updates the row count again, and formats specific cell with desired duration format
# (the last cell in column A)

cell = 'A' + str(sheet.max_row)

sheet[cell].number_format = '[hh]:mm:ss'


# Writes to file on disk

writer.save()
writer.close()

The above code can of course easily be abstracted into one function handling writing to both new files and existing files, and extended to managing any number of different sheets or columns, as needed.

Answered By - P A N

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, July 29, 2022

[FIXED] Pandas: How to write custom time duration format to Excel file with pd.ExcelWriter via Openpyxl

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels