Issue
I grouped a column in a pandas dataframe by the number of occurrences of an event per hour of the day like so:
df_sep.hour.groupby(df_sep.time.dt.hour).size()
Which gives the following result:
time
2 31
3 6
4 7
5 4
6 38
7 9
8 5
9 31
10 8
11 2
12 5
13 30
14 1
15 1
16 28
18 1
20 4
21 29
Name: hour, dtype: int64
For plotting, I would like to complete the series for each hour of the day. ie, there are no occurrences at midnight (0). So for every missing hour, I would like to create that index and add zero to the corresponding value.
To solve this I created two lists (x and y) using the following loop, but it feels a bit hacky... is there a better way to solve this?
x = []
y = []
for i in range(24):
if i not in df_sep.hour.groupby(df_sep.time.dt.hour).size().index:
x.append(i)
y.append(0)
else:
x.append(i)
y.append(df_sep.hour.groupby(df_sep.time.dt.hour).size().loc[i])
result:
for i, j in zip(x, y):
print(i, j)
0 0
1 0
2 31
3 6
4 7
5 4
6 38
7 9
8 5
9 31
10 8
11 2
12 5
13 30
14 1
15 1
16 28
17 0
18 1
19 0
20 4
21 29
22 0
23 0
Solution
Use Series.reindex
with range(24)
:
df_sep.hour.groupby(df_sep.time.dt.hour).size().reindex(range(24), fill_value=0)
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.