Wednesday, August 24, 2022

[FIXED] When would a Python float lose precision when cast to Protobuf/C++ float?

August 24, 2022 protocol-buffers, python No comments

Issue

I'm interested in minimising the size of a protobuf message serialised from Python.

Protobuf has floats (4 bytes) and doubles (8 bytes). Python has a float type that's actually a C double, at least in CPython.

My question is: given an instance of a Python float, is there a "fast" way of checking if the value would lose precision if it was assigned to a protobuf float (or really a C++ float) ?

Solution

You can check convert the float to a hex representation; the sign, exponent and fraction each get a separate section. Provided the fraction uses only the first 6 hex digits (the remaining 7 digits must be zero), and the 6th digit is even (so the last bit is not set) will your 64-bit double float fit in a 32-bit single. The exponent is limited to a value between -126 and 127:

import math
import re

def is_single_precision(
        f,
        _isfinite=math.isfinite,
        _singlepat=re.compile(
            r'-?0x[01]\.[0-9a-f]{5}[02468ace]0{7}p'
            r'(?:\+(?:1[01]\d|12[0-7]|[1-9]\d|\d)|'
            r'-(?:1[01]\d|12[0-6]|[1-9]\d|\d))$').match):
    return not _isfinite(f) or _singlepat(f.hex()) is not None or f == 0.0

The float.hex() method is quite fast, faster than roundtripping via struct or numpy; you can create 1 million hex representations in under half a second:

>>> timeit.Timer('(1.2345678901e+26).hex()').autorange()
(1000000, 0.47934128501219675)

The regex engine is also pretty fast, and with name lookups optimised in the function above we can test 1 million float values in about 1.1 seconds:

>>> import random, sys
>>> testvalues = [0.0, float('inf'), float('-inf'), float('nan')] + [random.uniform(sys.float_info.min, sys.float_info.max) for _ in range(2 * 10 ** 6)]
>>> timeit.Timer('is_single_precision(f())', 'from __main__ import is_single_precision, testvalues; f = iter(testvalues).__next__').autorange()
(1000000, 1.1044921400025487)

The above works because the binary32 format for floats allots 23 bits for the fraction. The exponent is allotted 8 bits (signed). The regex only allows for the first 23 bits to be set, and the exponent to be within the range for a signed 8-bit number.

Also see

This may not be what you want however! Take for example 1/3rd or 1/10th. Both are values which require approximation in floating point values, and both fail the test:

>>> (1/3).hex()
'0x1.5555555555555p-2'
>>> (1/10).hex()
'0x1.999999999999ap-4'

You may have to instead take a heuristic approach; if your hex value has all zeros in the first 6 digits of the fraction, or an exponent outside of the (-126, 127) range, converting to double would lead to too much loss.

Answered By - Martijn Pieters

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, August 24, 2022

[FIXED] When would a Python float lose precision when cast to Protobuf/C++ float?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels