Issue
I have database for the changes made for given issue Id at any given date . The values in changed_parameter are the parameters for which the changes were made .
Its old value and new value for the changed parameter are updated in old_value new_value columns respectively
Issue_Id | Due_Date | status | estimation_hour | changed_date | changed_parameter | old_value | new_value |
---|---|---|---|---|---|---|---|
101 | 1/31/2023 | closed | 40 | 1/10/2023 | status | Defined | Accepted |
101 | 1/31/2023 | closed | 40 | 1/15/2023 | estimation_hour | 0 | 20 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | estimation_hour | 20 | 30 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | Due_Date | 1/20/2023 | 1/31/2023 |
101 | 1/31/2023 | closed | 40 | 1/20/2023 | status | Accepted | InProgress |
101 | 1/31/2023 | closed | 40 | 1/25/2023 | estimation_hour | 30 | 40 |
101 | 1/31/2023 | closed | 40 | 1/30/2023 | status | InProgress | Closed |
102 | 2/28/2023 | closed | 50 | 1/10/2023 | status | Defined | Accepted |
102 | 2/28/2023 | closed | 50 | 1/15/2023 | estimation_hour | 0 | 30 |
102 | 2/28/2023 | closed | 50 | 1/20/2023 | status | Accepted | InProgress |
102 | 2/28/2023 | closed | 50 | 1/25/2023 | estimation_hour | 30 | 50 |
102 | 2/28/2023 | closed | 50 | 1/30/2023 | status | InProgress | Closed |
So now I have to create a snapshot of the issues id at changed dates based on above data
So my final table should look like this
Issue_Id | Due_Date | status | estimation_hour | changed_date | changed_parameter | old_value | new_value |
---|---|---|---|---|---|---|---|
101 | 1/20/2023 | Accepted | 0 | 1/10/2023 | status | Defined | Accepted |
101 | 1/20/2023 | Accepted | 20 | 1/15/2023 | estimation_hour | 0 | 20 |
101 | 1/20/2023 | Accepted | 30 | 1/16/2023 | estimation_hour | 20 | 30 |
101 | 1/31/2023 | Accepted | 30 | 1/16/2023 | Due_Date | 1/20/2023 | 1/31/2023 |
101 | 1/31/2023 | InProgress | 30 | 1/20/2023 | status | Accepted | InProgress |
101 | 1/31/2023 | InProgress | 40 | 1/25/2023 | estimation_hour | 30 | 40 |
101 | 1/31/2023 | closed | 40 | 1/30/2023 | status | InProgress | Closed |
102 | 2/28/2023 | Accepted | 0 | 1/10/2023 | status | Defined | Accepted |
102 | 2/28/2023 | Accepted | 30 | 1/15/2023 | estimation_hour | 0 | 30 |
102 | 2/28/2023 | InProgress | 30 | 1/20/2023 | status | Accepted | InProgress |
102 | 2/28/2023 | InProgress | 50 | 1/25/2023 | estimation_hour | 30 | 50 |
102 | 2/28/2023 | closed | 50 | 1/30/2023 | status | InProgress | Closed |
I had updated the above data in pandas dataframe and iterated over each row.
In each iteration I updated old value of changed parameter to previously available rows of that issue ID and till the same . new value was updated to current row as follows
iterating over each row is time consuming
1st Iteration
Issue_Id | Due_Date | status | estimation_hour | changed_date | changed_parameter | old_value | new_value |
---|---|---|---|---|---|---|---|
101 | 1/31/2023 | Accepted | 40 | 1/10/2023 | status | Defined | Accepted |
101 | 1/31/2023 | closed | 40 | 1/15/2023 | estimation_hour | 0 | 20 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | estimation_hour | 20 | 30 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | Due_Date | 1/20/2023 | 1/31/2023 |
101 | 1/31/2023 | closed | 40 | 1/20/2023 | status | Accepted | InProgress |
101 | 1/31/2023 | closed | 40 | 1/25/2023 | estimation_hour | 30 | 40 |
101 | 1/31/2023 | closed | 40 | 1/30/2023 | status | InProgress | Closed |
102 | 2/28/2023 | closed | 50 | 1/10/2023 | status | Defined | Accepted |
102 | 2/28/2023 | closed | 50 | 1/15/2023 | estimation_hour | 0 | 30 |
102 | 2/28/2023 | closed | 50 | 1/20/2023 | status | Accepted | InProgress |
102 | 2/28/2023 | closed | 50 | 1/25/2023 | estimation_hour | 30 | 50 |
102 | 2/28/2023 | closed | 50 | 1/30/2023 | status | InProgress | Closed |
2nd Iteration
Issue_Id | Due_Date | status | estimation_hour | changed_date | changed_parameter | old_value | new_value |
---|---|---|---|---|---|---|---|
101 | 1/31/2023 | Accepted | 0 | 1/10/2023 | status | Defined | Accepted |
101 | 1/31/2023 | closed | 20 | 1/15/2023 | estimation_hour | 0 | 20 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | estimation_hour | 20 | 30 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | Due_Date | 1/20/2023 | 1/31/2023 |
101 | 1/31/2023 | closed | 40 | 1/20/2023 | status | Accepted | InProgress |
101 | 1/31/2023 | closed | 40 | 1/25/2023 | estimation_hour | 30 | 40 |
101 | 1/31/2023 | closed | 40 | 1/30/2023 | status | InProgress | Closed |
102 | 2/28/2023 | closed | 50 | 1/10/2023 | status | Defined | Accepted |
102 | 2/28/2023 | closed | 50 | 1/15/2023 | estimation_hour | 0 | 30 |
102 | 2/28/2023 | closed | 50 | 1/20/2023 | status | Accepted | InProgress |
102 | 2/28/2023 | closed | 50 | 1/25/2023 | estimation_hour | 30 | 50 |
102 | 2/28/2023 | closed | 50 | 1/30/2023 | status | InProgress | Closed |
3rd Iteration
Issue_Id | Due_Date | status | estimation_hour | changed_date | changed_parameter | old_value | new_value |
---|---|---|---|---|---|---|---|
101 | 1/31/2023 | Accepted | 0 | 1/10/2023 | status | Defined | Accepted |
101 | 1/31/2023 | closed | 20 | 1/15/2023 | estimation_hour | 0 | 20 |
101 | 1/31/2023 | closed | 30 | 1/16/2023 | estimation_hour | 20 | 30 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | Due_Date | 1/20/2023 | 1/31/2023 |
101 | 1/31/2023 | closed | 40 | 1/20/2023 | status | Accepted | InProgress |
101 | 1/31/2023 | closed | 40 | 1/25/2023 | estimation_hour | 30 | 40 |
101 | 1/31/2023 | closed | 40 | 1/30/2023 | status | InProgress | Closed |
102 | 2/28/2023 | closed | 50 | 1/10/2023 | status | Defined | Accepted |
102 | 2/28/2023 | closed | 50 | 1/15/2023 | estimation_hour | 0 | 30 |
102 | 2/28/2023 | closed | 50 | 1/20/2023 | status | Accepted | InProgress |
102 | 2/28/2023 | closed | 50 | 1/25/2023 | estimation_hour | 30 | 50 |
102 | 2/28/2023 | closed | 50 | 1/30/2023 | status | InProgress | Closed |
4th Iteration
Issue_Id | Due_Date | status | estimation_hour | changed_date | changed_parameter | old_value | new_value |
---|---|---|---|---|---|---|---|
101 | 1/20/2023 | Accepted | 0 | 1/10/2023 | status | Defined | Accepted |
101 | 1/20/2023 | closed | 20 | 1/15/2023 | estimation_hour | 0 | 20 |
101 | 1/20/2023 | closed | 30 | 1/16/2023 | estimation_hour | 20 | 30 |
101 | 1/31/2023 | closed | 40 | 1/16/2023 | Due_Date | 1/20/2023 | 1/31/2023 |
101 | 1/31/2023 | closed | 40 | 1/20/2023 | status | Accepted | InProgress |
101 | 1/31/2023 | closed | 40 | 1/25/2023 | estimation_hour | 30 | 40 |
101 | 1/31/2023 | closed | 40 | 1/30/2023 | status | InProgress | Closed |
102 | 2/28/2023 | closed | 50 | 1/10/2023 | status | Defined | Accepted |
102 | 2/28/2023 | closed | 50 | 1/15/2023 | estimation_hour | 0 | 30 |
102 | 2/28/2023 | closed | 50 | 1/20/2023 | status | Accepted | InProgress |
102 | 2/28/2023 | closed | 50 | 1/25/2023 | estimation_hour | 30 | 50 |
102 | 2/28/2023 | closed | 50 | 1/30/2023 | status | InProgress | Closed |
5th Iteration
Issue_Id | Due_Date | status | estimation_hour | changed_date | changed_parameter | old_value | new_value |
---|---|---|---|---|---|---|---|
101 | 1/20/2023 | Accepted | 0 | 1/10/2023 | status | Defined | Accepted |
101 | 1/20/2023 | Accepted | 20 | 1/15/2023 | estimation_hour | 0 | 20 |
101 | 1/20/2023 | Accepted | 30 | 1/16/2023 | estimation_hour | 20 | 30 |
101 | 1/31/2023 | Accepted | 40 | 1/16/2023 | Due_Date | 1/20/2023 | 1/31/2023 |
101 | 1/31/2023 | InProgress | 40 | 1/20/2023 | status | Accepted | InProgress |
101 | 1/31/2023 | closed | 40 | 1/25/2023 | estimation_hour | 30 | 40 |
101 | 1/31/2023 | closed | 40 | 1/30/2023 | status | InProgress | Closed |
102 | 2/28/2023 | closed | 50 | 1/10/2023 | status | Defined | Accepted |
102 | 2/28/2023 | closed | 50 | 1/15/2023 | estimation_hour | 0 | 30 |
102 | 2/28/2023 | closed | 50 | 1/20/2023 | status | Accepted | InProgress |
102 | 2/28/2023 | closed | 50 | 1/25/2023 | estimation_hour | 30 | 50 |
102 | 2/28/2023 | closed | 50 | 1/30/2023 | status | InProgress | Closed |
and so on..
Solution
As your case is not so trivial, you need to reshape your dataframe then group by Issue_Id
then update the values. Since your data frame is sorted by "changed_date", the idea is to fill new values forward and fill old values backward. If the parameter is missing, simply fill in with the existing values:
def update_values(df):
return df['new_value'].ffill().fillna(df['old_value'].bfill())
upd_values = (df.pivot_table(index=df.index, columns='changed_parameter',
values=['old_value', 'new_value'], aggfunc='first')
.groupby(df['Issue_Id']).apply(update_values)
.droplevel('Issue_Id').fillna(df))
df[upd_values.columns] = upd_values
Output:
>>> df
Issue_Id Due_Date status estimation_hour changed_date changed_parameter old_value new_value
0 101 1/20/2023 Accepted 0 1/10/2023 status Defined Accepted
1 101 1/20/2023 Accepted 20 1/15/2023 estimation_hour 0 20
2 101 1/20/2023 Accepted 30 1/16/2023 estimation_hour 20 30
3 101 1/31/2023 Accepted 30 1/16/2023 Due_Date 1/20/2023 1/31/2023
4 101 1/31/2023 InProgress 30 1/20/2023 status Accepted InProgress
5 101 1/31/2023 InProgress 40 1/25/2023 estimation_hour 30 40
6 101 1/31/2023 Closed 40 1/30/2023 status InProgress Closed
7 102 2/28/2023 Accepted 0 1/10/2023 status Defined Accepted
8 102 2/28/2023 Accepted 30 1/15/2023 estimation_hour 0 30
9 102 2/28/2023 InProgress 30 1/20/2023 status Accepted InProgress
10 102 2/28/2023 InProgress 50 1/25/2023 estimation_hour 30 50
11 102 2/28/2023 Closed 50 1/30/2023 status InProgress Closed
Old answer
old_value = pd.to_numeric(df['old_value'], errors='coerce').shift(-1)
new_value = pd.to_numeric(df['new_value'], errors='coerce')
df['estimation_hour'] = old_value.fillna(new_value).ffill().convert_dtypes()
df['status'] = df['new_value'].mask(new_value.notna()).ffill()
Output:
>>> df
Issue_Id status estimation_hour changed_date changed_parameter old_value new_value
0 101 Accepted 0 1/10/2023 status Defined Accepted
1 101 Accepted 20 1/15/2023 estimation_hour 0 20
2 101 InProgress 20 1/20/2023 status Accepted InProgress
3 101 InProgress 40 1/25/2023 estimation_hour 20 40
4 101 Closed 40 1/30/2023 status InProgress Closed
5 102 Accepted 0 1/10/2023 status Defined Accepted
6 102 Accepted 30 1/15/2023 estimation_hour 0 30
7 102 InProgress 30 1/20/2023 status Accepted InProgress
8 102 InProgress 50 1/25/2023 estimation_hour 30 50
9 102 Closed 50 1/30/2023 status InProgress Closed
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.