Issue
I have a list of tickets with data on: ticket name, created date, status, closed date.
A new column will be calculated based on created / closed date. If a new ticket with the status = open is created that month, the new column value will increase by one. Value will decrease by one on the month the ticket is moved to a closed status.
How would I set up a Df with the index described above, and how would I go about doing the cumulative calculation in pandas? I'm specifially struggling with setting index as a time series of dates and and having the issues show up on the correct row Starting Data:
ID Created Date Closed Date
0 FND-1974 2021-10-18 00:00:00 2022-03-31
1 FND-10310 2021-10-18 00:00:00 2022-03-31
2 FND-10310 2021-10-18 00:00:00 2022-03-31
3 FND-10310 2021-07-21 00:00:00 NaT
4 FND-9862 2021-07-20 00:00:00 2022-02-28
.. ... ... ...
100 41 2020-04-13 13:34:39 NaT
101 40 2020-04-13 13:32:14 NaT
102 35 2020-04-01 17:48:23 NaT
103 18 2020-01-21 16:08:54 NaT
104 4 2020-02-25 14:56:37 NaT
Current approach:
df = pd.DataFrame(index= pd.Series(pd.date_range('2021-7-1', dt.date.today(),freq="D")))
df['ID'] = df_agg['Exception_ID']
df['Created Date'] = df_agg['Created_On_Date']
df['Closed Date'] = df_agg['Closed_Date']
df['count'] = 0
for index, row in df.iterrows():
if index >= row['Created Date']:
row['count'] += 1
if index >= row['Closed Date']:
row['count'] -= 1
print(df.head)
Output:
ID Created Date Closed Date count
2021-07-01 NaN NaT NaT 0
2021-07-02 NaN NaT NaT 0
2021-07-03 NaN NaT NaT 0
2021-07-04 NaN NaT NaT 0
2021-07-05 NaN NaT NaT 0
... ... ... ...
2022-03-20 NaN NaT NaT 0
2022-03-21 NaN NaT NaT 0
2022-03-22 NaN NaT NaT 0
2022-03-23 NaN NaT NaT 0
2022-03-24 NaN NaT NaT 0
Obviously, I would like to populate the row when the ID, Created Date, Closed Date and add 1 to our count when an issue has been opened on a specific day. im losing it trying to figure this out
Solution
Count number of events by month as described here:
Then calculate a cumulative sum using cumsum.
import numpy as np
import pandas as pd
# Prepare some data
dates = np.random.choice(pd.date_range('2020-01-01', '2021-10-31'), size=100)
data = {'ID': [f"FND-{i}" for i in range(100)],
'Created Date': dates,
'Closed Date': dates + pd.to_timedelta(np.random.poisson(60, size=100), unit='D')}
df_agg = pd.DataFrame(data)
# Add some NaT values
df_agg.loc[df_agg['Closed Date'] > '2021-10-31', 'Closed Date'] = None
# Make a dataframe of monthly stats
index = pd.period_range('2021-01', '2021-10', freq='M', name='Month')
monthly_summary = pd.DataFrame(index=index)
monthly_summary['Opened'] = df_agg['ID'].groupby(df_agg['Created Date'].dt.to_period('M')).count()
monthly_summary['Closed'] = df_agg['ID'].groupby(df_agg['Closed Date'].dt.to_period('M')).count()
monthly_summary = monthly_summary.fillna(0).astype(int)
monthly_summary['Net Change'] = monthly_summary['Opened'] - monthly_summary['Closed']
# Calculate cumulative sum of open issues
start_count = 50
monthly_summary['Month-end Count'] = start_count + monthly_summary['Net Change'].cumsum()
print(monthly_summary)
Opened Closed Net Change Month-end Count
Month
2021-01 2 7 -5 45
2021-02 2 6 -4 41
2021-03 11 2 9 50
2021-04 11 3 8 58
2021-05 6 11 -5 53
2021-06 3 10 -7 46
2021-07 5 5 0 46
2021-08 1 4 -3 43
2021-09 6 5 1 44
2021-10 4 1 3 47
Answered By - Bill
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.