Issue
If I have an int that fits into 32 bits, what is the fastest way to split it up into four 8-bit values in python? My simple timing test suggests that bit masking and shifting is moderately faster than divmod()
, but I'm pretty sure I haven't thought of everything.
>>> timeit.timeit("x=15774114513484005952; y1, x =divmod(x, 256);y2,x = divmod(x, 256); y3, y4 = divmod(x, 256)")
0.5113952939864248
>>> timeit.timeit("x=15774114513484005952; y1=x&255; x >>= 8;y2=x&255; x>>=8; y3=x&255; y4= x>>8")
0.41230630996869877
Before you ask: this operation will be used a lot. I'm using python 3.4.
Solution
If you're doing it a lot, the fastest approach is to create a specialized Struct
instance and pre-bind the pack
method:
# Done once
int_to_four_bytes = struct.Struct('<I').pack # Little-endian so y1 will be least-significant byte, use '>I' for big-endian
# Done many times (you need to mask here, because your number is >32 bits)
y1, y2, y3, y4 = int_to_four_bytes(x & 0xFFFFFFFF)
Using struct.pack
directly would use a cached Struct
object after the first use, but you'd pay cache lookup costs to go from format string to cached Struct
every time, which is suboptimal. By creating and prebinding the pack
of a Struct
object (which is implemented in C in CPython), you bypass all Python byte code execution beyond the actual function call, and spend no time on cache lookups. On my machine, this runs in about 205 ns, vs. 267 ns for shift and mask (without reassigning x
).
An alternate approach (for more general, not struct
compatible sizes) is using int.to_bytes
; for example, in this case:
y1, y2, y3, y4 = (x & 0xFFFFFFFF).to_bytes(4, 'little') # Use 'big' for big-endian; on 3.11+, you can omit it entirely, with 'big' being the default
which takes about the same amount of time as the manually shifting and masking approach (it took 268 ns per loop), but scales to larger numbers of bytes better.
Side-note: Timings are from an older version of CPython. As of 3.11, it appears that int.to_bytes(4, 'little')
and int.to_bytes(4)
both run about as fast as the pre-built struct.Struct.pack
, while int.to_bytes(4, 'big')
runs a little slower, but still much faster than:
y1 = x & 0xFF
y2 = x>>8 & 0xFF
y3 = x>>16 & 0xFF
y4 = x>>24 & 0xFF
which takes ~70% longer than the prebound pack
method, explicit 'little'
to_bytes
or implicit 'big'
to_bytes
, and ~50% longer than explicit 'big'
to_bytes
. This stuff varies from version to version (the 3.12 self-modifying bytecode might reduce the overhead of the shift-and-mask approach more for instance), so it's not worth getting too hung up on in general. Pre-bound pack
methods of a struct.Struct
are reliably no worse than tied for fastest in every version of Python I've tested, with to_bytes
tied or close behind if you need to handle weirder numbers of bytes, and they're fairly self-documenting, so use one of them.
Answered By - ShadowRanger
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.