Issue
Above is a link to an example of a CSV file I am modifying using python, I need to add a time column that increases by 1 if the date from the previous row matches.
If the date Changes the time would start back over at 8:00:00
additionally if the 'PL Seq' changes from G* to H* the time would also start back over at 8.
I think I have the logic down, just having a hard time writing it.
add a column to the df 'Time' set the first 'Time' value to 8:00:00
read each row in the df
If date value = date value of previous row and pl seq value first character = first character set time value to time +1
else set time value to time
*Just a note I already have the code to change the format of the order #'s and the Dates of the Goal state
Current
MODELCHASS,Prod Date,PL Seq
M742-021167,20200917,G0005
M359-020535,20200917,G0010
M742-022095,20200917,G0015
M220-001083,20200918,G0400
M742-022390,20200918,G0405
M907-004747,20200918,H0090
M934-005904,20200918,H0095
Expected
MODELCHASS,Prod Date,PL Seq,Time
M742 021167,2020-09-17T,G0005,8:00:00
M359 020535,2020-09-17T,G0010,8:00:01
M742 022095,2020-09-17T,G0015,8:00:02
M220 001083,2020-09-18T,G0400,8:00:00
M742 022390,2020-09-18T,G0405,8:00:01
M907 004747,2020-09-18T,H0090,8:00:00
M934 005904,2020-09-18T,H0095,8:00:01
@Trenton Can we modify this If H orders have the same date as G orders for example
Current Edit in Line 6
MODELCHASS,Prod Date,PL Seq
M742-021167,20200917,G0005
M359-020535,20200917,G0010
M742-022095,20200917,G0015
M220-001083,20200918,G0400
M742-022390,20200918,G0405
M907-004747,20200917,H0090
M934-005904,20200917,H0095
Expected Edit
MODELCHASS,Prod Date,PL Seq,Time
M742 021167,2020-09-17T,G0005,8:00:00
M359 020535,2020-09-17T,G0010,8:00:01
M742 022095,2020-09-17T,G0015,8:00:02
M220 001083,2020-09-18T,G0400,8:00:00
M742 022390,2020-09-18T,G0405,8:00:01
M907 004747,2020-09-17T,H0090,8:00:00
M934 005904,2020-09-17T,H0095,8:00:01
Solution
- Convert the
'Prod Date'
column to a datetime - Sort the dataframe by
'Prod Date'
and'PL Seq'
so'df'
will be in the same order astime_seq
for joining. - The main part of the answer is to create a
DateRange
list with the.groupby
and.apply
.groupby
theProd Date
and the first element of'PL Seq'
df.groupby(['Prod Date', df['PL Seq'].str[0]])
.apply(lambda x: (pd.date_range(start=x.values[0] + pd.Timedelta(hours=8), periods=len(x), freq='s')).time)
- For each group, use the first value in x as
start
:x.values[0]
- To this date, add an 8 hour Timedelta, to get
08:00:00
- To this date, add an 8 hour Timedelta, to get
- The number of
periods
islen[x]
- the
freq
is's'
, for seconds. - This creates a
DateRange
, from which the time is extracted with.time
- For each group, use the first value in x as
- Tested in
python 3.10
,pandas 1.4.3
import pandas as pd
# setup test dataframe
data = {'MODELCHASS': ['M742-021167', 'M359-020535', 'M742-022095', 'M220-001083', 'M742-022390', 'M907-004747', 'M934-005904'],
'Prod Date': [20200917, 20200917, 20200917, 20200918, 20200918, 20200918, 20200918],
'PL Seq': ['G0005', 'G0010', 'G0015', 'G0400', 'G0405', 'H0090', 'H0095']}
df = pd.DataFrame(data)
# convert Prod Date to a datetime column
df['Prod Date'] = pd.to_datetime(df['Prod Date'], format='%Y%m%d')
# sort the dataframe by values so the order will correspond to the groupby order
df = df.sort_values(['Prod Date', 'PL Seq']).reset_index(drop=True)
# groupby Prod Date and the first character of PL Seq
# create a DateRange sequence for each group
# reshape the dataframe
time_seq = (df.groupby(['Prod Date', df['PL Seq'].str[0]])['Prod Date']
.apply(lambda x: (pd.date_range(start=x.values[0] + pd.Timedelta(hours=8), periods=len(x), freq='s')).time)
.reset_index(name='time_seq')
.explode('time_seq', ignore_index=True))
# join the time_seq column to df
df_new = df.join(time_seq.time_seq)
# display(df_new)
MODELCHASS Prod Date PL Seq time_seq
0 M742-021167 2020-09-17 G0005 08:00:00
1 M359-020535 2020-09-17 G0010 08:00:01
2 M742-022095 2020-09-17 G0015 08:00:02
3 M220-001083 2020-09-18 G0400 08:00:00
4 M742-022390 2020-09-18 G0405 08:00:01
5 M907-004747 2020-09-18 H0090 08:00:00
6 M934-005904 2020-09-18 H0095 08:00:01
Answered By - Trenton McKinney
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.