Tuesday, October 25, 2022

[FIXED] How can I combine multiple rows (transactions) if the following transactions are closed within 30 days after the 1st transaction

October 25, 2022 group-by, pandas, python No comments

Issue

this is my first post and I will try my best to describe the problem.

I have a table below to help illustrate. I am trying to combine some transactions for each Account ID. The logic I want is: Account AAAAA had their 1st transaction "Close Date" on 2019-12-16, if their following transactions are "closed" within 1 month after the 1st (i.e. window 2019-12-16 ~ 2020-01-16), then I want to combine those transactions to be just one single transaction, but use the "Created Date" from the 1st transaction (i.e 2019-12-15) and "Close Date" from the last transaction within the window (i.e 2020-01-02). Then after that window, the next transaction for Account AAAAA is "closed" on 2020-01-22, the window will be 2020-01-22 ~ 2020-02-22, but they only have one transaction in that window, so it just stays as a single transaction. I want to implement this logic for each Account ID.

Initial Table:

Account ID	Created Date	Close Date	Product Code
AAAAAA	2019-12-15	2019-12-16	Apple; Banana
AAAAAA	2019-12-20	2019-12-26	Strawberry
AAAAAA	2020-01-02	2020-01-02	Apple; Banana
AAAAAA	2020-01-15	2020-01-22	Apple; Banana
BBBBBB	2020-02-04	2020-02-05	Watermelon
BBBBBB	2020-03-09	2020-03-10	Strawberry
BBBBBB	2020-03-10	2020-03-15	Apple; Banana

import pandas as pd

data = [['AAAAA', '2019-12-15', '2019-12-16', 'Apple; Banana'],
['AAAAA', '2019-12-20', '2019-12-26', 'Strawberry'],
['AAAAA', '2020-01-02', '2020-01-02', 'Apple; Banana'],
['AAAAA', '2020-01-15', '2020-01-22', 'Apple; Banana'],
['BBBBB', '2020-02-04', '2020-02-05', 'Watermelon'],
['BBBBB', '2020-03-09', '2020-03-10', 'Strawberry'],
['BBBBB', '2020-03-10', '2020-03-15', 'Apple; Banana']]

initial_df = pd.DataFrame(data, columns=['Account ID', 'Created Date', 'Close Date', 'Products'])

initial_df

After:

Account ID	Created Date	Close Date	Product Code
AAAAAA	2019-12-15	2020-01-02	Apple; Banana; Strawberry; Apple; Banana
AAAAAA	2020-01-15	2020-01-22	Apple; Banana
BBBBB	2020-02-04	2020-02-05	Watermelon
BBBBB	2020-03-09	2020-03-15	Strawberry; Apple; Banana

Is there a way I can achieve this? I will greatly appreciate you help!!

Solution

I was able to figure it out myself using below codes, but if there is a better way, I would love to know as well.

from dateutil.relativedelta import relativedelta

grouped = initial_df.groupby(by=['Account ID'])
window_list = []
for a, b in grouped:
  window_end_date = list(b['Close Date'])[0] + dateutil.relativedelta.relativedelta(months=1)
  for i in b['Close Date']:
    if pd.to_datetime(i) <= window_end_date:
      window_list.append(window_end_date)
    elif pd.to_datetime(i) > window_end_date:
      window_end_date = i + dateutil.relativedelta.relativedelta(months=1)
      window_list.append(window_end_date)

initial_df['window'] = window_list
created_start_list = []
close_end_list = []
grouped = initial_df.groupby(['Account ID', 'window'])
for a, b in grouped:
  created_start = min(list(b['Created Date']))
  created_start_list.append(created_start)
  close_end = max(list(b['Close Date']))
  close_end_list.append(close_end)

df = grouped['Products'].apply('; '.join).reset_index()
df['created_start'] = created_start_list
df['closed_start'] = close_end_list
df

Answered By - minxyen

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 25, 2022

[FIXED] How can I combine multiple rows (transactions) if the following transactions are closed within 30 days after the 1st transaction

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels