Issue
I have a large dataframe that looks like this:
Nationality | Sex | Response |
---|---|---|
American | Female | I have no need for this product. |
German | Male | It looks great. |
Finnish | Female | I would definitely buy one. |
etc.
What I want to do is to randomly select a number of responses from each group so that I can analyse them further.
My groupby function has returned something like this:
Nationality Sex
American Male 567
American Female 342
German Male 421
German Female 234
Finnish Male 149
Finnish Female 67
etc.
I want to have a new dataframe with 20 random responses of each group. Is that possible using lambda? Because new_df = df.groupby('Nationality')['Sex'].apply(lambda x: x.sample(20))
doesn't return what I am looking for. Is there a way to do this?
Solution
Using iterrows from Pandas you can iterate over DataFrame rows as (index, Series) pairs, and get what you want:
new_df = df.groupby(['Nationality', 'Sex'], as_index=False).size()
for _, row in new_df.iterrows():
print(df[(df.Nationality==row.Nationality)&(df.Sex==row.Sex)].sample(20))
Answered By - Renato Aranha
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.