Issue
I have a pandas dataframe in Python that contains the following pair of columns.
I need to count how many times pairs and triplets of combination of data appear with and without considering the order. As an example, let's say that I have a dataframe with two columns, Classification
and Individual
and the following token data
data = {
'Classification': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5],
'Individual': ['A', 'A', 'B', 'B', 'A', 'A', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'A', 'A', 'B', 'B', 'B']
}
Now, I want to arrive to the following results
Clasification ValueSeries TimesClassification PercentageClassification
1 AB 5 1
2 AB 5 1
3 AC 2 0.4
3 AB 5 1
3 ABC 3 0.6
4 AB 5 1
4 BC 2 0.4
4 ABC 3 0.6
5 AC 2 0.4
5 AB 5 1
5 ABC 3 0.6
this is, for each value of clasification the unnordered pairs and triplets contained within.
Solution
The exact logic is not fully clear, but you can use itertools
to produce the combinations
of Classification
, then apply a value_counts
and groupby.transform
to compute the counts:
from itertools import chain, combinations
def powerset(s):
s = set(s)
return list(chain.from_iterable(combinations(s, r)
for r in range(2, len(s)+1))
)
out = df.groupby('Classification')['Individual'].agg(powerset).explode()
out = (out
.reset_index(name='ValueSeries')
.merge(out.value_counts().rename('TimesClassification'),
how='left',
left_on='ValueSeries', right_index=True)
.assign(PercentageClassification=lambda d: d['TimesClassification']
/ d.groupby('Classification')['TimesClassification'].transform('max')
)
)
Output:
Classification ValueSeries TimesClassification PercentageClassification
0 1 (A, B) 5 1.0
1 2 (A, B) 5 1.0
2 3 (C, A) 3 0.6
3 3 (C, B) 3 0.6
4 3 (A, B) 5 1.0
5 3 (C, A, B) 3 0.6
6 4 (C, A) 3 0.6
7 4 (C, B) 3 0.6
8 4 (A, B) 5 1.0
9 4 (C, A, B) 3 0.6
10 5 (C, A) 3 0.6
11 5 (C, B) 3 0.6
12 5 (A, B) 5 1.0
13 5 (C, A, B) 3 0.6
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.