Issue
Under some circumstances, Python pretty print (pprint.pprint) produces a TypeError, and it caught me a bit by surprise.
We can create a Counter object from (eg) a list of integers and pretty print it:
from collections import Counter
from pprint import pprint
intlist = [1,2,3,4,5,6,5,2,5,9,4,7,2,1,4,6,8,54,6,2,45,6,8,4,21,23,6,7,3,35561,1,6,8,]
intcounter = Counter(intlist)
pprint(intcounter)
Counter({6: 6, 2: 4, 4: 4, 1: 3, 5: 3, 8: 3, 3: 2, 7: 2, 9: 1, 54: 1, 45: 1, 21: 1, 23: 1, 35561: 1})
We can add a key to it without converting it to a "native" dictionary too (because Counters are a subclass of dict)
from collections import Counter
from pprint import pprint
intlist = [1,2,3,4,5,6,5,2,5,9,4,7,2,1,4,6,8,54,6,2,45,6,8,4,21,23,6,7,3,35561,1,6,8,]
intcounter = Counter(intlist)
intcounter["Hello"] = "World"
# and you can print that too
print(intcounter)
Counter({1: 3, 2: 4, 3: 2, 4: 4, 5: 3, 6: 6, 9: 1, 7: 2, 8: 3, 54: 1, 45: 1, 21: 1, 23: 1, 35561: 1, 'Hello': 'World'})
but can we then prettyprint the updated object?
try:
pprint(intcounter)
except Exception as t:
print(t)
Nope.
Counter({'<' not supported between instances of 'int' and 'str'
Ok how about we turn pprint's default sorting behaviour off?
try:
pprint(intcounter, sort_dicts=False)
except TypeError as t:
print(t)
also nope:
Counter({'<' not supported between instances of 'int' and 'str'
Note also that we can't use update on a Counter() object if a value in the updating dict is type str (even though, as above, we can add the key:value "directly")
try:
intcounter.update({"Hello": "World"})
except TypeError as t:
print(t)
can only concatenate str (not "int") to str
I think (but I'm just hamfisted amateur coder so I'm not sure) that the Python docs for Counter() might cover why we can't use the update method :
Note Counters were primarily designed to work with positive integers to represent running counts; however, care was taken to not unnecessarily preclude use cases needing other types or negative values. To help with those use cases, this section documents the minimum range and type restrictions. The Counter class itself is a dictionary subclass with no restrictions on its keys and values. The values are intended to be numbers representing counts, but you could store anything in the value field.
The most_common() method requires only that the values be orderable.
For in-place operations such as c[key] += 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are supported. The same is also true for update() and subtract() which allow negative and zero values for both inputs and outputs.
The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
Obviously if we force the Counter object to a "native" dictionary (dict(intcounter)
) everything will work as expected, but I wondered if pprint should handle this a bit more elegantly, even though I realise this is quite edge-casey and very few people will trip over this in the same way I did.
(I was passing a Counter() to a bokeh charting function & it seemed convenient to pass some extra k:v pairs that the function used by simply updating the Counter() object, pprint was just used to visually check my work)
Python 3.8 btw.
Solution
pprint
is not at blame here. When you perform your call:
pprint(intcounter)
This will actually call __repr__
from Counter
Which is the one calling most_common
def __repr__(self):
if not self:
return f'{self.__class__.__name__}()'
try:
# dict() preserves the ordering returned by most_common()
d = dict(self.most_common())
except TypeError:
# handle case where values are not orderable
d = dict(self)
return f'{self.__class__.__name__}({d!r})'
Note that when you add your key/value, either by assignment ([key] = value
) or using update they are not validated.
The class Counter assume that you pass the value as type int
but does no such validation for it.
When you using update, the code won't validate it either but will crash at line:
self[elem] = count + self_get(elem, 0)
Since count
is the value you passed of type str
and it cannot be concatenate with 0
.
As opposed to using assignment, where the line is basically:
self[key] = value
The update method will concatenate the previous value with the new value. So basically if the value was 5 and you add 1, the result would be 6. In the case you assigned a str
value it will raise an unhandled exception.
Now again this will pass using assigment, but once any methods must do computation it will eventually crash.
Always ensure your value when using counter is of type int
Answered By - Nic Laforge
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.