Issue
I have an energy data with a start time and end time interval. I want to distribute the total energy to the continuous time series (dividing total energy to total hours). As I searched the results, I have found staircase and daterange functions. However, with these results I couldn't distribute energy and sum same intervals. Sample df;
df = pd.DataFrame({
"Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
"Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
"TotalEnergy":[10, 5, 2],
})
I want to distribute each row and sum same intervals with 15Min resolution. Note that I need TotalEnergy/TotalTime.
My trials;
Staircase Code:
import pandas as pd
times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
import staircase as sc
stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
result = stepfunction(times, include_index=True)
result = pd.DataFrame(result)
Result 1:
2019-01-01 00:00:00 10
2019-01-01 00:15:00 10
2019-01-01 00:30:00 10
2019-01-01 00:45:00 10
2019-01-01 01:00:00 15
2019-01-01 01:15:00 15
2019-01-01 01:30:00 15
2019-01-01 01:45:00 15
2019-01-01 02:00:00 12
2019-01-01 02:15:00 12
2019-01-01 02:30:00 12
2019-01-01 02:45:00 12
2019-01-01 03:00:00 2
2019-01-01 03:15:00 2
2019-01-01 03:30:00 2
2019-01-01 03:45:00 2
2019-01-01 04:00:00 0
New code according to Riley, the author of "Staircase"
import pandas as pd
df = pd.DataFrame({
"Start":["2019-01-01 00:00:00", "2019-01-01 01:00:00", "2019-01-01 02:00:00"],
"Stop":["2019-01-01 03:00:00", "2019-01-01 02:00:00", "2019-01-01 04:00:00"],
"TotalEnergy":[10, 5, 2],
})
df["Start"] = pd.to_datetime(df["Start"])
df["Stop"] = pd.to_datetime(df["Stop"]),
import staircase as sc
sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
bins = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
sf.slice(pd.IntervalIndex.from_breaks(times)).mean()
Output:
(2019-01-01, 2019-01-01 00:15:00] 10.0
(2019-01-01 00:15:00, 2019-01-01 00:30:00] 10.0
(2019-01-01 00:30:00, 2019-01-01 00:45:00] 10.0
(2019-01-01 00:45:00, 2019-01-01 01:00:00] 10.0
(2019-01-01 01:00:00, 2019-01-01 01:15:00] 15.0
(2019-01-01 01:15:00, 2019-01-01 01:30:00] 15.0
(2019-01-01 01:30:00, 2019-01-01 01:45:00] 15.0
(2019-01-01 01:45:00, 2019-01-01 02:00:00] 15.0
(2019-01-01 02:00:00, 2019-01-01 02:15:00] 12.0
(2019-01-01 02:15:00, 2019-01-01 02:30:00] 12.0
(2019-01-01 02:30:00, 2019-01-01 02:45:00] 12.0
(2019-01-01 02:45:00, 2019-01-01 03:00:00] 12.0
(2019-01-01 03:00:00, 2019-01-01 03:15:00] 2.0
(2019-01-01 03:15:00, 2019-01-01 03:30:00] 2.0
(2019-01-01 03:30:00, 2019-01-01 03:45:00] 2.0
(2019-01-01 03:45:00, 2019-01-01 04:00:00] 2.0
However, this is not what I want. First energy is 10kWh, time interval is 3 hours, that means 12*15 minutes. So each bin should contain 10kWh/12bins = 0.83kWh. The stepfunction should look like maybe;
stepfunction = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy"/("Stop"-"Start")
The desired output (calculated by hand):
(2019-01-01, 2019-01-01 00:15:00] 0.83
(2019-01-01 00:15:00, 2019-01-01 00:30:00] 0.83
(2019-01-01 00:30:00, 2019-01-01 00:45:00] 0.83
(2019-01-01 00:45:00, 2019-01-01 01:00:00] 0.83
(2019-01-01 01:00:00, 2019-01-01 01:15:00] 2.08
(2019-01-01 01:15:00, 2019-01-01 01:30:00] 2.08
(2019-01-01 01:30:00, 2019-01-01 01:45:00] 2.08
(2019-01-01 01:45:00, 2019-01-01 02:00:00] 2.08
(2019-01-01 02:00:00, 2019-01-01 02:15:00] 1.08
(2019-01-01 02:15:00, 2019-01-01 02:30:00] 1.08
(2019-01-01 02:30:00, 2019-01-01 02:45:00] 1.08
(2019-01-01 02:45:00, 2019-01-01 03:00:00] 1.08
(2019-01-01 03:00:00, 2019-01-01 03:15:00] 0.25
(2019-01-01 03:15:00, 2019-01-01 03:30:00] 0.25
(2019-01-01 03:30:00, 2019-01-01 03:45:00] 0.25
(2019-01-01 03:45:00, 2019-01-01 04:00:00] 0.25
Solution
I'm not 100% clear on what is needed, but if it is to essentially take the average (i.e. "distribute" as you say) the energy within 15 minute intervals then the below approach with staircase
can be used.
setup
import pandas as pd
df = pd.DataFrame({
"Start":["2019-08-27 14:52:00", "2019-03-01 10:14:05", "2019-07-31 12:54:10", "2019-06-16 10:55:57", "2019-03-07 21:21:04"],
"Stop":["2019-08-27 17:58:19", "2019-03-01 13:13:54", "2019-07-31 13:21:45", "2019-06-16 14:35:14", "2019-03-08 07:22:59"],
"TotalEnergy":[9.86, 9.38, 1.45, 38.77, 26.14],
})
df["Start"] = pd.to_datetime(df["Start"])
df["Stop"] = pd.to_datetime(df["Stop"])
df looks like this
Start Stop TotalEnergy
0 2019-08-27 14:52:00 2019-08-27 17:58:19 9.86
1 2019-03-01 10:14:05 2019-03-01 13:13:54 9.38
2 2019-07-31 12:54:10 2019-07-31 13:21:45 1.45
3 2019-06-16 10:55:57 2019-06-16 14:35:14 38.77
4 2019-03-07 21:21:04 2019-03-08 07:22:59 26.14
solution
create step function
import staircase as sc
sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergy")
create 15 minute bins (a pandas.IntervalIndex
)
times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
bins = pd.IntervalIndex.from_breaks(times)
slice the step function into bins and take the mean to get a pandas.Series
indexed by the bins
sf.slice(bins).mean()
The result looks like this
(2019-03-01 10:14:05, 2019-03-01 10:29:05] 9.38
(2019-03-01 10:29:05, 2019-03-01 10:44:05] 9.38
(2019-03-01 10:44:05, 2019-03-01 10:59:05] 9.38
(2019-03-01 10:59:05, 2019-03-01 11:14:05] 9.38
(2019-03-01 11:14:05, 2019-03-01 11:29:05] 9.38
...
(2019-08-27 16:29:05, 2019-08-27 16:44:05] 9.86
(2019-08-27 16:44:05, 2019-08-27 16:59:05] 9.86
(2019-08-27 16:59:05, 2019-08-27 17:14:05] 9.86
(2019-08-27 17:14:05, 2019-08-27 17:29:05] 9.86
(2019-08-27 17:29:05, 2019-08-27 17:44:05] 9.86
Length: 17214, dtype: float64
Note this will take a bit of computation time. Averaging across 17214 is not a vectorised calculation unfortunately.
updated solution
The solution above doesn't solve the problem @ast was intending. The solution below will.
After setup of dataframe, and conversion to timestamp etc create a column in your dataframe which represents the energy per hour rate
interval_hrs = (df["Stop"] - df["Start"])/pd.Timedelta("1hr")
df["TotalEnergyRate"] = df["TotalEnergy"]/interval_hrs
This time we'll use the value of TotalEnergyRate to create the step function
sf = sc.Stairs(df, start="Start", end="Stop", value="TotalEnergyRate")
What you have is a step function, whose value at any time will be the energy rate. If you want to know how much energy is consumed in a particular time period then the result is an integration. We'll slice as before, but this time call integral
. This gives you a Series of pandas.Timedelta
which you can then divide through by a 1 hour timedelta to convert back from Timedeltas to achieve the answer you are after.
times = pd.date_range(df["Start"].min(), df["Stop"].max(), freq="15min")
bins = pd.IntervalIndex.from_breaks(times)
sf.slice(bins).integral()/pd.Timedelta("1hr")
Disclaimer (required by stackoverflow): I am the creator of the staircase package.
Answered By - Riley
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.