Issue
MY DATA
Let's say I have the following dataframe where each row consists of a unqiue shopping cart:
pd.DataFrame({'0':['banana','apple','orange','milk'],'1':['apple','milk','bread','cheese'],'2':['bread','cheese','banana','eggs']})
0 1 2
0 banana apple bread
1 apple milk cheese
2 orange bread banana
3 milk cheese eggs
WHAT I AM TRYING TO DO
I am trying to create a list of the most common pairings of size n out of each of these shopping carts. For example, the most common pairings of size 2 would be banana, bread
and milk, cheese
pairing count
banana, bread 2
milk, cheese 2
apple, bread 1
...
orange, banana 1
To clarify, order does not matter here, or in other words, whichever item shows up in the cart first is irrelevant. banana, bread
is the same as bread, banana
WHAT I HAVE TRIED
I tried putting all unique values in a list and iterating through each row and bruteforcing the pairings with itertools
, but this seems like a very hacky and unpythonic workaround, plus I didn't even get it to work propely.
Solution
You can use itertools.combinations
and collection.Counter
to efficiently loop over the combinations of values per row (as frozenset
), then optionally convert back to Series:
from itertools import combinations
from collections import Counter
out = pd.Series(Counter(frozenset(c) for r in df.to_numpy()
for c in combinations(r, 2)))
Output:
(banana, apple) 1
(banana, bread) 2
(apple, bread) 1
(apple, milk) 1
(apple, cheese) 1
(milk, cheese) 2
(bread, orange) 1
(banana, orange) 1
(milk, eggs) 1
(eggs, cheese) 1
dtype: int64
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.