Friday, October 21, 2022

[FIXED] Get most frequently used many to many field

October 21, 2022 database, django, django-models, django-queryset, sql No comments

Issue

I have the following models.

class Tag(models.Model):
    name = models.CharField(max_length=30)
    # and other fields ...

class Book(models.Model):
    name = models.CharField(max_length=140)
    tags = models.ManyToManyField(Tag, blank=True)
    # and other fields 

class Article(models.Model):
    name = models.CharField(max_length=140)
    tags = models.ManyToManyField(Tag, blank=True)

And few other models have tags as ManyToMany field. I would like to get the list of most used tag objects. I tried to filter most frequently used tags from each model and then getting top ten from each and combining them with other top ten. I think there should be something I could do from 'Tag' model itself to find the most used tag instances.

Is there any way of finding the most used tag instances except my approach ? Any help would be really appreciated.

Solution

Say you want the top 10 of tags used by Books, then you can query this like:

from django.db.models import Count

Tag.objects.annotate(
    nused=Count('book')
).order_by('-nused')[:10]

We thus query the database for Tags that we order by the number of related books for each tag.

We can use multiple counts, but by using them in one query, this will typically generate an expensive query: in that case you will JOIN on all these related models, and as a result, the time complexity of the query will typically grow exponential in the number of models. Although it is possible that some database managers find out that these are "independent" subqueries, my experience that popular ones usually do not. So we better use multiple queries here: one per related model.

So now we first need to find out what the related models are. Luckily Django provides some utility functions for that. Every model class has a ._meta object that stores information about the model. One of these attribute is .fields_map, which returns a dictionary that maps the name of relations to the relation object.

We can use this to enumerate over the relations, and thus for every relation use a query:

from collections import Counter
from django.db.models import Count

cntr = Counter()
for relation in Tag._meta.fields_map:
    cntr.update(
        {
            tg: tg.nr
            for tg in Tag.objects.annotate(nr=Count(relation)).order_by('nr')[:10]
        }
    )

At the end, we will have a Counter that contains, for these tags, the total number of occurrences. Note however that since we each time limit the number to 10.

We can then obtain the most popular tags by obtaining the most common from the counter:

from operator import itemgetter

my_tags = map(itemgetter(0), ca.most_common(10))

The .most_common(10) will generate the top 10 of tags (by summing up the rank of the most common per relation), and return a list of 2-tuples: every tuple contains the Tag instance, and the number of uses. By using a map(itemgetter(0), ...), we only obtain the Tags. But you might be interested in the numbers as well.

Why limiting per relation might be a bad idea...

This does not mean that we per se have the most frequent tag. Indeed, say that a tag is the 11-th most popular tag for Books, and Articles, then this still can be the overall most popular tag, since it is possible that the top 10 of Books and Articles is totally different. Or a small example:

Top Books    Top Articles
1. A (10)    1. D (12)
2. B (8)     2. E (8)
3. C (7)     3. C (7)

If we would generate a top 2 with the above approach, we thus would miss C that actually occurs most (14 times in total).

We can solve this problem by simply always counting all Tags, and thus remove the [:10] limit:

from collections import Counter
from django.db.models import Count

cntr = Counter()
for relation in Tag._meta.fields_map:
    cntr.update(
        {
            tg: tg.nr
            for tg in Tag.objects.annotate(nr=Count(relation)).order_by('nr')
        }
    )

Answered By - Willem Van Onsem

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, October 21, 2022

[FIXED] Get most frequently used many to many field

Issue

Solution

Why limiting per relation might be a bad idea...

0 comments:

Post a Comment

Popular Posts

Labels