Issue
I have a table with data for who viewed a page and who clicked on it. The following code gets me the pivot table below:
# users who clicked, by hour and weekday
pct = df.pivot_table(columns=['weekday'],index=['hour'], values=['users_who_clicked','users_who_viewed'], aggfunc= sum, fill_value=0, margins=True)
pct
I've been googling and searching here for a solid hour but I can't seem to find how to just display the rate: i.e. 'users_who_clicked' divided by 'users_who_viewed'. I don't need to see the other two series.
I feel like if I just create a new column across the whole data set dividing the two it won't aggregate correctly, so the aggregation needs to happen within the table itself. It's easy in Excel, just use a calculated field, so I imagine it must be fairly straightforward in pandas as well. Any thoughts?
Thanks!
p.s. Tried using a lambda function but got a "TypeError: unsupported operand type(s) for +: 'int' and 'str'" error. Tried putting the two tables directly in aggfunc and no luck.
Solution
Use DataFrame.xs
for select by MultiIndex
, rename first level and divide, last append to original DataFrame
:
d = {('users_who_clicked','a'): [5, 6, 7, 8],
('users_who_clicked','b'): [9, 10, 11, 12],
('users_who_viewed','a'): [4, 1, 3, 7],
('users_who_viewed','b'): [1, 3, 7, 3]}
df = pd.DataFrame(data=d)
df1 = (df.xs('users_who_clicked', axis=1, level=0, drop_level=False)
.rename(columns={'users_who_clicked':'%'}, level=0))
df2 = (df.xs('users_who_viewed', axis=1, level=0, drop_level=False)
.rename(columns={'users_who_viewed':'%'}, level=0))
out = pd.concat([df, df1.div(df2)], axis=1)
print (out)
users_who_clicked users_who_viewed %
a b a b a b
0 5 9 4 1 1.250000 9.000000
1 6 10 1 3 6.000000 3.333333
2 7 11 3 7 2.333333 1.571429
3 8 12 7 3 1.142857 4.000000
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.