Monday, June 13, 2022

[FIXED] How do I use python to create new rows to fill in time gap based on a specified number of rows to be added?

June 13, 2022 datetime, python No comments

Issue

I'm using Python to join Google Fit data to another data series which lists activities by minute. Below is code to duplicate an example of how the data is currently formatted.

Dffit = pd.DataFrame ({"Time": ['2022-05-28 08:52:00','2022-05-28 09:00:00','2022-05-28 09:09:00'], 
                   "fitnessActivity": ['running','biking','swimming'], 
                   "minutes": [3,5,4]})
print(Dffit)

This produces starting data like this:

                   Time fitnessActivity minutes
0  2022-05-28 08:52:00         running        3
1  2022-05-28 09:00:00          biking        5
2  2022-05-28 09:09:00        swimming        4

I want to create new rows that increment the time column by 1 minute each, and duplicate the value in the fitnessActivity column. The minutes column specifies the number of rows needed.

I want my data to look like this:

               Time  fitnessActivity  minutes                                     
2022-05-28 08:52:00         running      3.0
2022-05-28 08:53:00         running      NaN
2022-05-28 08:54:00         running      NaN
2022-05-28 09:00:00          biking      5.0
2022-05-28 09:01:00          biking      NaN
2022-05-28 09:02:00          biking      NaN
2022-05-28 09:03:00          biking      NaN
2022-05-28 09:04:00          biking      NaN
2022-05-28 09:09:00        swimming      4.0
2022-05-28 09:10:00        swimming      NaN
2022-05-28 09:11:00        swimming      NaN
2022-05-28 09:12:00        swimming      NaN

I found several examples showing how to fill in missing time series, including this one, which I used as a model for writing my code. The problem is that it fills in based on other rows below in the dataset. Any time gaps are filled in with the previous activity, when what I actually want is to preserve time gaps, once the specified number of minutes have been added. Also, I want to add rows to the last activity. Currently, none are being added since there are no time rows below it.

# Convert Time to a datetime object
Dffit['Time'] = pd.to_datetime(Dffit['Time'],format='%Y-%m-%d %H:%M:%S.%f')
# Set Time column as index
Dffit.set_index(['Time'], inplace=True)
Dffit = Dffit.sort_index()
# Resample
out = Dffit[["fitnessActivity", "minutes"]].asfreq('60S')
out["fitnessActivity"] = Dffit["fitnessActivity"].asfreq('60S', method="ffill").asfreq('60S')
print(out)

My current output looks like this:

               Time fitnessActivity  minutes
                           
2022-05-28 08:52:00         running      3.0
2022-05-28 08:53:00         running      NaN
2022-05-28 08:54:00         running      NaN
2022-05-28 08:55:00         running      NaN
2022-05-28 08:56:00         running      NaN
2022-05-28 08:57:00         running      NaN
2022-05-28 08:58:00         running      NaN
2022-05-28 08:59:00         running      NaN
2022-05-28 09:00:00          biking      5.0
2022-05-28 09:01:00          biking      NaN
2022-05-28 09:02:00          biking      NaN
2022-05-28 09:03:00          biking      NaN
2022-05-28 09:04:00          biking      NaN
2022-05-28 09:05:00          biking      NaN
2022-05-28 09:06:00          biking      NaN
2022-05-28 09:07:00          biking      NaN
2022-05-28 09:08:00          biking      NaN
2022-05-28 09:09:00        swimming      4.0

Solution

import pandas as pd

Dffit = pd.DataFrame({"Time": ['2022-05-28 08:52:00', '2022-05-28 09:00:00', '2022-05-28 09:09:00'],
                      "fitnessActivity": ['running', 'biking', 'swimming'],
                      "minutes": [3, 5, 4]})
Dffit['Time'] = pd.to_datetime(Dffit['Time'], format='%Y-%m-%d %H:%M:%S.%f')

Dffit.set_index(['Time'], inplace=True)

aaa = [pd.date_range(i, periods=Dffit.loc[i, "minutes"], freq='60S') for i in Dffit.index]
aaa = aaa[0].union(aaa[1]).union(aaa[2])

Dffit = Dffit.reindex(aaa)
Dffit['fitnessActivity'] = Dffit['fitnessActivity'].fillna(method='ffill')
print(Dffit)

Output

                    fitnessActivity  minutes
2022-05-28 08:52:00         running      3.0
2022-05-28 08:53:00         running      NaN
2022-05-28 08:54:00         running      NaN
2022-05-28 09:00:00          biking      5.0
2022-05-28 09:01:00          biking      NaN
2022-05-28 09:02:00          biking      NaN
2022-05-28 09:03:00          biking      NaN
2022-05-28 09:04:00          biking      NaN
2022-05-28 09:09:00        swimming      4.0
2022-05-28 09:10:00        swimming      NaN
2022-05-28 09:11:00        swimming      NaN
2022-05-28 09:12:00        swimming      NaN

In the 'aaa' list generator, the necessary indexes are created, which are then combined into one array. Re-indexing with new indexes. The empty values of the 'fitnessActivity ' column are filled with the previous values.

Update

If there is more than 3 data or even replace the line where the index lists are stacked on the union using np.hstack.

import pandas as pd
import numpy as np

Dffit = pd.DataFrame({"Time": ['2022-05-28 08:52:00', '2022-05-28 09:00:00', '2022-05-28 09:09:00'],
                      "fitnessActivity": ['running', 'biking', 'swimming'],
                      "minutes": [3, 5, 4]})
Dffit['Time'] = pd.to_datetime(Dffit['Time'], format='%Y-%m-%d %H:%M:%S.%f')

Dffit.set_index(['Time'], inplace=True)

aaa = [pd.date_range(i, periods=Dffit.loc[i, "minutes"], freq='60S') for i in Dffit.index]

aaa = pd.DatetimeIndex(np.array(np.hstack(aaa)))
Dffit = Dffit.reindex(aaa)
Dffit['fitnessActivity'] = Dffit['fitnessActivity'].fillna(method='ffill')
print(Dffit)

Answered By - inquirer

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, June 13, 2022

[FIXED] How do I use python to create new rows to fill in time gap based on a specified number of rows to be added?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels