Issue
I am working with transaction data and would like to get past 12 months rolling Active Customer Base but on monthly level.
Below is an example of transaction data i have - Transaction Data
Cust-ID Date-of-Purchase Quantity Customer-Tag
N01847 01-04-18 10 B2C
N01847 15-04-18 20 B2C
N01847 25-06-19 20 B2C
N01847 12-02-20 100 B2C
N01847 25-03-20 150 B2C
N02341 01-12-19 20 B2B
N02341 02-02-20 150 B2B
N02011 25-01-19 10 B2B
N01033 02-06-18 30 B2C
N01033 02-04-19 40 B2C
N01033 15-04-19 50 B2C
N01033 15-02-20 20 B2C
N01010 16-01-18 100 B2C
N01010 25-02-18 250 B2C
N01010 03-09-18 10 B2C
N01010 04-04-19 250 B2C
N01010 15-06-19 100 B2C
N01010 26-07-19 40 B2C
N01010 17-09-19 10 B2C
N01010 17-09-19 10 B2C
N01010 20-03-20 20 B2C
N09100 20-01-18 20 B2B
N09100 20-01-18 20 B2B
N09100 25-01-20 20 B2B
N09100 25-01-20 20 B2B
N09100 25-01-20 20 B2B ֿ
N09100 25-01-20 20 B2B
And below is what i am expecting to get with python - desired output
Month-Year B2C-Rolling-past-12months-Active-Customers Monthly-Active-Customers Monthly-Active-Rate
201801 100230 25058 25.0%
201802 100524 25634 25.5%
201803 100810 25213 25.0%
201804 101253 25495 25.2%
201805 101351 25525 25.2%
201806 103210 25998 25.2%
201807 103678 26122 25.2%
201808 103977 26202 25.2%
201809 104512 26342 25.2%
201810 104624 26376 25.2%
201811 105479 26597 25.2%
201812 111256 28059 26.2%
201901 112247 28314 25.2%
201902 112947 28497 25.2%
201903 113508 28644 25.2%
201904 113857 28737 25.2%
201905 114572 28924 25.2%
201906 115443 29149 25.3%
201907 116056 29310 25.3%
201908 116528 29435 25.3%
201909 116971 29553 25.3%
201910 117647 29729 25.3%
201911 118492 29949 25.3%
201912 124095 31371 26.3%
202001 124895 31580 25.3%
202002 125653 31778 25.3%
202003 126320 31953 25.3%
I would really appreciate any help with the python (spyder) code that will help to get the desired result.
Solution
You may want to use pandas, and then do something like that:
df["Date-of-Purchase"] = pd.to_datetime(df["Date-of-Purchase"], dayfirst=True)
df["Month"] = df["Date-of-Purchase"].dt.month
df["Year"] = df["Date-of-Purchase"].dt.year
res = df.groupby(["Year", "Month"])["Cust-ID"].nunique()
This would give you the number of unique customers per month. Assuming you have data for all the months, you can now use rolling
to get a rolling sum for 12 months (I used 3 months in the code for easier debugging):
monthly_customers = df.groupby(["Year", "Month"])["Cust-ID"].apply(lambda x: frozenset(x.values))
monthly_customers = monthly_customers.reset_index()
monthly_customers = pd.concat([monthly_customers] + [monthly_customers["Cust-ID"].shift(i) for i in range(1, 3)], axis ="columns")
monthly_customers.columns = ["Year", "Month"] + [ f"shift_{i}" for i in range(3) ]
def count_unique(row):
total_set = frozenset()
columns = [ f"shift_{i}" for i in range(3) ]
for col in columns:
if row.get(col) and type(row.get(col)) == frozenset:
total_set = total_set | row.get(col)
return len(total_set)
monthly_customers["N_month_count"] = monthly_customers.apply(count_unique, axis=1)
monthly_customers
If you don't have data for all the months, you'll need to fill in missing months.
Answered By - Roy2012
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.