Issue
The following is an example of the .csv file of thousands of lines I have for the different bus lines.
Table:
trip_id | arrival_time | departure_time | stop_id | stop_sequence | stop_headsign |
---|---|---|---|---|---|
107_1_D_1 | 6:40:00 | 6:40:00 | AREI2 | 1 | |
107_1_D_1 | 6:40:32 | 6:40:32 | JD4 | 2 | |
107_1_D_1 | 6:41:27 | 6:41:27 | PNG4 | 3 |
Raw Data:
trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign
107_1_D_1,6:40:00,6:40:00,AREI2,1,
107_1_D_1,6:40:32,6:40:32,JD4,2,
107_1_D_1,6:41:27,6:41:27,PNG4,3,
I want to create a table or dataframe that creates a line for each road segment and calculates the time between each arrival_time.
Expected result:
some other trip_id may share the same RoadSegment
Solution
I think in such cases you can use shift. Here is an example:
df = pd.read_csv('...')
result = pd.DataFrame()
for _, trip_df in df.groupby('trip_id', sort=False): # type: str, pd.DataFrame
trip_df = trip_df.sort_values('stop_sequence')
trip_df['arrival_time'] = pd.to_timedelta(trip_df['arrival_time'])
trip_df['departure_time'] = pd.to_timedelta(trip_df['departure_time'])
trip_df['prev_arrival_time'] = trip_df['arrival_time'].shift()
trip_df['prev_stop_id'] = trip_df['stop_id'].shift()
trip_df['RoadSegment'] = trip_df['prev_stop_id'].str.cat(trip_df['stop_id'], sep='-')
trip_df['planned_duration'] = trip_df['departure_time'] - trip_df['prev_arrival_time']
trip_df = trip_df.dropna(subset=['planned_duration'])
trip_df['planned_duration'] = (
trip_df['planned_duration']
.apply(lambda x: x.total_seconds())
.astype(int)
)
result = pd.concat(
[result, trip_df[['RoadSegment', 'trip_id', 'planned_duration']]],
sort=False,
ignore_index=True,
)
print(result)
Answered By - Danila Ganchar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.