Issue
there is a column with 4 categories, I want to display the frequency of occurrence of values of other columns for each unique value example of partial Data
Output
Solution
Starting from this:
df = pd.DataFrame(
{
"cat1": ["yes", "no", "yes", "no", "yes"],
"cat2": ["a", "a", "b", "b", "a"],
"cat3": ["yes", "no", "no", "yes", "no"],
"quant": [1, 2, 3, 4, 5],
}
)
Sample DataFrame:
cat1 cat2 cat3 quant
0 yes a yes 1
1 no a no 2
2 yes b no 3
3 no b yes 4
4 yes a no 5
You can do:
y = lambda x: x.value_counts(normalize=True).loc["yes"]
n = lambda x: x.value_counts(normalize=True).loc["no"]
df.groupby(["cat2"]).agg(
{
"cat1": [("yes", y), ("no", n)],
"cat3": [("yes", y), ("no", n)],
"quant": ["min", "max", "mean"],
}
)
Result:
cat1 cat3 quant
yes no yes no min max mean
cat2
a 0.666667 0.333333 0.333333 0.666667 1 5 2.666667
b 0.500000 0.500000 0.500000 0.500000 3 4 3.500000
Here's a slightly more robust version:
from functools import partial
def agg_func(s: pd.Series, name: str):
try:
return s.value_counts(normalize=True).loc[name]
except KeyError:
return 0
yes_no_agg = [
("yes", partial(agg_func, name="yes")),
("no", partial(agg_func, name="no")),
]
df.groupby(["cat2"]).agg(
{
"cat1": yes_no_agg,
"cat3": yes_no_agg,
"quant": ["min", "max", "mean"],
}
)
Answered By - 965311532
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.