Issue
I have the following models.
class Tag(models.Model):
name = models.CharField(max_length=30)
# and other fields ...
class Book(models.Model):
name = models.CharField(max_length=140)
tags = models.ManyToManyField(Tag, blank=True)
# and other fields
class Article(models.Model):
name = models.CharField(max_length=140)
tags = models.ManyToManyField(Tag, blank=True)
And few other models have tags as ManyToMany field. I would like to get the list of most used tag objects. I tried to filter most frequently used tags from each model and then getting top ten from each and combining them with other top ten. I think there should be something I could do from 'Tag' model itself to find the most used tag instances.
Is there any way of finding the most used tag instances except my approach ? Any help would be really appreciated.
Solution
Say you want the top 10 of tags used by Book
s, then you can query this like:
from django.db.models import Count
Tag.objects.annotate(
nused=Count('book')
).order_by('-nused')[:10]
We thus query the database for Tag
s that we order by the number of related books for each tag.
We can use multiple counts, but by using them in one query, this will typically generate an expensive query: in that case you will JOIN
on all these related models, and as a result, the time complexity of the query will typically grow exponential in the number of models. Although it is possible that some database managers find out that these are "independent" subqueries, my experience that popular ones usually do not. So we better use multiple queries here: one per related model.
So now we first need to find out what the related models are. Luckily Django provides some utility functions for that. Every model class has a ._meta
object that stores information about the model. One of these attribute is .fields_map
, which returns a dictionary that maps the name of relations to the relation object.
We can use this to enumerate over the relations, and thus for every relation use a query:
from collections import Counter
from django.db.models import Count
cntr = Counter()
for relation in Tag._meta.fields_map:
cntr.update(
{
tg: tg.nr
for tg in Tag.objects.annotate(nr=Count(relation)).order_by('nr')[:10]
}
)
At the end, we will have a Counter
that contains, for these tags, the total number of occurrences. Note however that since we each time limit the number to 10.
We can then obtain the most popular tags by obtaining the most common from the counter:
from operator import itemgetter
my_tags = map(itemgetter(0), ca.most_common(10))
The .most_common(10)
will generate the top 10 of tags (by summing up the rank of the most common per relation), and return a list of 2-tuples: every tuple contains the Tag
instance, and the number of uses. By using a map(itemgetter(0), ...)
, we only obtain the Tag
s. But you might be interested in the numbers as well.
Why limiting per relation might be a bad idea...
This does not mean that we per se have the most frequent tag. Indeed, say that a tag is the 11-th most popular tag for Book
s, and Article
s, then this still can be the overall most popular tag, since it is possible that the top 10 of Book
s and Article
s is totally different. Or a small example:
Top Books Top Articles
1. A (10) 1. D (12)
2. B (8) 2. E (8)
3. C (7) 3. C (7)
If we would generate a top 2 with the above approach, we thus would miss C
that actually occurs most (14
times in total).
We can solve this problem by simply always counting all Tag
s, and thus remove the [:10]
limit:
from collections import Counter
from django.db.models import Count
cntr = Counter()
for relation in Tag._meta.fields_map:
cntr.update(
{
tg: tg.nr
for tg in Tag.objects.annotate(nr=Count(relation)).order_by('nr')
}
)
Answered By - Willem Van Onsem
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.