Issue
I have df
and I'd like to make some sampling from it with respect to distribution of some variable. Let's say df['type'].value_counts(normalize=True)
returns:
A 0.3
B 0.5
C 0.2
I'd like to make something like sampledf = df.sample(weights=df['type'].value_counts(normalize=True))
such that sampledf ['type'].value_counts(normalize=True)
will return almost the same distridution. How to pass dict with frequency here?
Solution
Weights
has to take a series of the same length as the original df, so best is to add it as a column:
df['freq'] = df.groupby('type')['type'].transform('count')
sampledf = df.sample(weights = df.freq)
Or without adding the column:
sampledf = df.sample(weights = df.groupby('type')['type'].transform('count'))
Answered By - Josh Friedlander
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.