Issue
I'm doing my first steps in python programing language. I want to create a script that aims to open an excel file and add an extra column that will be the hourly difference between the two dates and save it to another file. To count hours I'm using businessDuration from https://pypi.org/project/business-duration/
My DF looks like that:
Id | Date1 | Date2 | Date3
1 01/01/2022 08:10:51 | 02/01/2022 08:15:51 | 15/01/2022 23:15:51
2 01/01/2022 08:15:51 (and so on) for every column with dates
3
4
My code look's like that
import pandas as pd
import os
import warnings
from datetime import time, datetime
from business_duration import businessDuration
from itertools import repeat
def WeekDays_hour(start, end, un):
start_date = start
end_date = end
unit = un
return list(map(businessDuration, start_date, end_date, repeat(unit)))
os.chdir("C:\\Users\\M\\PycharmProjects\\Python\\source")
car_path = os.getcwd()
for filename in os.listdir(car_path):
if filename.endswith(".xlsx"):
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
df = pd.read_excel(car_path+'\\'+filename, engine="openpyxl")
if df['Date1'] is not None:
df['Time difference'] = WeekDays_hour(df['Date1'], df['Date2'], 'hour')
df.to_excel("C:\\Users\\M\\PycharmProjects\\Python\\rep\\"+filename, index=False)
I want to be able to call function "WeekDays_hour" for every pair of dates I need and as of result of this function I want to have another column in df.
I'm not entirely sure what I'm doing wrong, program is finishing without errors, but added column is empty. No calculations done. If any of you have any ideas I will be grateful.
EDIT: I tried @BeRT2me code.
When i put it like that:
from business_duration import businessDuration
import pandas as pd
df = pd.read_excel(r'C:\Users\M\PycharmProjects\CAR\test.xlsx')
def weekdays_hour(x: pd.Series, start_date: str, end_date: str, un: str) -> int:
if x[start_date] and x[end_date]:
return businessDuration(x[start_date], x[end_date], un)
df['difference'] = df.apply(weekdays_hour, args=('Date1', 'Date2', 'hour'))
df.to_excel(r"C:\Users\M\PycharmProjects\redy\Closed.xlsx", index=False)
When I run it I received error "KeyError: 'Date1'"
when I changed
df['difference'] = df.apply(weekdays_hour, axis=1, args=('Date1', 'Date2', 'hour'))
code run without errors, column was added but it was empty.
ANSWER by BeRT2me:
df['new'] = df.apply(lambda x: func(x['input'], x['input2']), axis=1)
and it works like a charm.
Solution
Dug deeper into the code of businessDuration and found that the following is already explicitly defined, so neither if df['Date1'] is not None:
nor if x[start_date] and x[end_date]:
are needed.
if pd.isnull(startdate) or pd.isnull(enddate) or startdate>enddate:
return np.nan
It also appears that unit
must be a named input, or it returns np.nan
.
Also, keep in mind that in your test data, Data1 > Date2, so it'll return np.nan
. If those are given as startdate, enddate respectively.
Given this, the following should work just fine:
df['difference'] = df.apply(lambda x: businessDuration(x['Date2'], x['Date1'], unit='hour'), axis=1)
My Output:
Date1 Date2 Date3 difference
0 2022-04-13 08:10:51 2022-02-01 08:15:51 2022-01-15 23:15:51 1223.916667
No need for any of my custom code, unless you want to rewrite how buisinessDuration
takes inputs~
Answered By - BeRT2me
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.