Monday, January 8, 2024

[FIXED] Fast way to split an int into bytes

January 08, 2024 python, python-3.x No comments

Issue

If I have an int that fits into 32 bits, what is the fastest way to split it up into four 8-bit values in python? My simple timing test suggests that bit masking and shifting is moderately faster than divmod(), but I'm pretty sure I haven't thought of everything.

>>> timeit.timeit("x=15774114513484005952; y1, x =divmod(x, 256);y2,x = divmod(x, 256); y3, y4 = divmod(x, 256)")
0.5113952939864248
>>> timeit.timeit("x=15774114513484005952; y1=x&255; x >>= 8;y2=x&255; x>>=8; y3=x&255; y4= x>>8")
0.41230630996869877

Before you ask: this operation will be used a lot. I'm using python 3.4.

Solution

If you're doing it a lot, the fastest approach is to create a specialized Struct instance and pre-bind the pack method:

# Done once
int_to_four_bytes = struct.Struct('<I').pack  # Little-endian so y1 will be least-significant byte, use '>I' for big-endian

# Done many times (you need to mask here, because your number is >32 bits)
y1, y2, y3, y4 = int_to_four_bytes(x & 0xFFFFFFFF)

Using struct.pack directly would use a cached Struct object after the first use, but you'd pay cache lookup costs to go from format string to cached Struct every time, which is suboptimal. By creating and prebinding the pack of a Struct object (which is implemented in C in CPython), you bypass all Python byte code execution beyond the actual function call, and spend no time on cache lookups. On my machine, this runs in about 205 ns, vs. 267 ns for shift and mask (without reassigning x).

An alternate approach (for more general, not struct compatible sizes) is using int.to_bytes; for example, in this case:

y1, y2, y3, y4 = (x & 0xFFFFFFFF).to_bytes(4, 'little')  # Use 'big' for big-endian; on 3.11+, you can omit it entirely, with 'big' being the default

which takes about the same amount of time as the manually shifting and masking approach (it took 268 ns per loop), but scales to larger numbers of bytes better.

Side-note: Timings are from an older version of CPython. As of 3.11, it appears that int.to_bytes(4, 'little') and int.to_bytes(4) both run about as fast as the pre-built struct.Struct.pack, while int.to_bytes(4, 'big') runs a little slower, but still much faster than:

y1 = x & 0xFF
y2 = x>>8 & 0xFF
y3 = x>>16 & 0xFF
y4 = x>>24 & 0xFF

which takes ~70% longer than the prebound pack method, explicit 'little' to_bytes or implicit 'big' to_bytes, and ~50% longer than explicit 'big' to_bytes. This stuff varies from version to version (the 3.12 self-modifying bytecode might reduce the overhead of the shift-and-mask approach more for instance), so it's not worth getting too hung up on in general. Pre-bound pack methods of a struct.Struct are reliably no worse than tied for fastest in every version of Python I've tested, with to_bytes tied or close behind if you need to handle weirder numbers of bytes, and they're fairly self-documenting, so use one of them.

Answered By - ShadowRanger

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, January 8, 2024

[FIXED] Fast way to split an int into bytes

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels