Issue
I have a DataFrame that looks like this:
|id|name|start_date|clicks|conversions|installs|downloads|
|--|----|----------|------|-----------|--------|-----------|
|101|India|2023-06-06|1|4|0|NaN|
|101|India|2023-06-06|2|5|0|NaN|
|101|India|2023-06-06|1|6|0|NaN|
|201|Kenya|2023-09-15|5|8|0|NaN|
|201|Kenya|2023-09-15|2|1|0|NaN|
|201|NaN|NaN|NaN|NaN|NaN|10|
|201|NaN|NaN|NaN|NaN|NaN|5|
|201|NaN|NaN|NaN|NaN|NaN|4|
My code is this:
pivot_df = pd.pivot_table(
df,
index=['id','name','start_date'],
aggfunc={'clicks': 'sum','conversions': 'sum','installs': 'sum', 'downloads': 'sum'})
And the output is:
|id|name|start_date|clicks|conversions|installs|downloads|
|--|----|----------|------|-----------|--------|-----------|
|101|India|2023-06-06|4|15|0|0|
|201|Kenya|2023-09-15|7|9|0|0|
The output is I expected to get
|id|name|start_date|clicks|conversions|installs|downloads|
|--|----|----------|------|-----------|--------|-----------|
|101|India|2023-06-06|4|15|0|0|
|201|Kenya|2023-09-15|7|9|0|19|
The sum of conversions comes out 0 and not 19. What have I done incorrectly?
Solution
You probably don't want a pivot_table
but rather a groupby.agg
:
out = (df.groupby('id', as_index=False)
.agg({'name': 'first',
'start_date': 'first',
'clicks': 'sum',
'conversions': 'sum',
'installs': 'sum',
'downloads': 'sum'})
)
Output:
id name start_date clicks conversions installs downloads
0 101 India 2023-06-06 4.0 15.0 0.0 0.0
1 201 Kenya 2023-09-15 7.0 9.0 0.0 19.0
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.