Issue
I have df with Weeks, Months and Years.
week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]
myDict = {}
myDict["Week"] = week
myDict["Month"] = month
myDict["Year"] = year
dates_df = pd.DataFrame(data=myDict)
dates_df
Week Month Year
01/03/2022 - 01/09/2022 January 2022
01/10/2022 - 01/16/2022 January 2022
01/17/2022 - 01/23/2022 January 2022
01/24/2022 - 01/30/2022 January 2022
Data looks like that (Previously Grouped):
test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
'01/24/2022 - 01/30/2022'],
'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}
test_df = pd.DataFrame(data=test_data)
test_df
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
I need to create additional rows of Client Name, City, Week, Month and Year for each Client (Id) which missing Week, from the date df + Spent column in this case should be 0.
Output I need:
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0
3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
I was trying to code it, but got no luck.
Solution
You could pivot
+ reindex
+ fillna
(to get the missing data) + stack
(to get back to the previous shape):
columns = ['Week','Month','Year']
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, ['Spent'])
.reindex(pd.MultiIndex.from_arrays(dates_df.assign(Spent='Spent').to_numpy()[:, [-1,0,1,2]].T,
names=[None]+columns), axis=1)
.fillna(0).stack(level=columns).reset_index())
Output:
CLient Id Client Name City Week Month Year Spent
0 1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000.0
1 1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750.0
2 1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200.0
3 1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850.0
4 2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777.0
5 2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500.0
6 2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0.0
7 2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200.0
8 3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500.0
9 3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0.0
10 3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0.0
11 3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200.0
If you have more than one columns to fill with 0, you could use:
columns = ['Week','Month','Year']
value_columns = ['Spent', ...]
new_df = (pd.DataFrame({'key':[1]*len(value_columns),'New':value_columns})
.merge(dates_df.assign(key=1)).drop(columns='key'))
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, value_columns)
.reindex(pd.MultiIndex.from_arrays(zip(*new_df.to_numpy()),
names=[None] + columns), axis=1)
.fillna(0).stack(level=columns).reset_index())
Answered By - enke
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.