Thursday, December 21, 2023

[FIXED] A numpy function convert list(tuple) of byte seires (with equal length) to numpy.ndarray

December 21, 2023 numpy, numpy-ndarray, python No comments

Issue

I need to convert a huge list of tuple of bytes to a numeric numpy.ndarray in a data processing task. The list, with the length of over 10 Millions, consists of tuples containing 3 450-bytes series, which looks like the example below

[
    (
        b'\n\x0f\n\t\x0c\x00\x00\x01\x07\x06...', # 450 bytes series
        b'\x00\x0e\x00\x06\x07\x0c\n\x0e\x07...', # also 450 bytes
        b'\x05\x0e\x07\t\x04\x01\x05\x07\x08...',
    ), # 3-byte-serie tuple
    (...), # more tuples like this
    ... # the number of tuples is up to 10M
]

What I hope to get is a numpy.uint8 array with the shape of (10Ms, 3, 450), in which each uint8 element is corresponding to a byte in the series (e.g. b'\n\x0f\n\t' to [10, 15, 10, 9]).

Or to be simple, I'm looking for a opposite function of element-wise numpy.ndarray.tobytes

Of course this can be realized with a simple iteration written with for in raw python, convert the byte series to 1-dimension array with numpy.fromiter one by one. But due to the huge amount of data, I hope use numpy to accelerate the process as much as possible. So what I want is a direct numpy function, or code with a few numpy functions without any raw python for iteration.

p.s. I've also tried to combine numpy.fromiter with np.frompyfunc and use it on the numpy array of bytes generate with np.array(..., dtype = object), but it still doesn't seem fast enough.

Solution

I think np.frombuffer might be what your are looking for:

import numpy as np

data = [
    (
        b'\n\x0f\n\t\x0c\x00\x00\x01\x07\x06', # 450 bytes series
        b'\x00\x0e\x00\x06\x07\x0c\n\x0e\x07', # also 450 bytes
        b'\x05\x0e\x07\t\x04\x01\x05\x07\x08',
    ),
    (
        b'\n\x0f\n\t\x0c\x00\x00\x01\x07\x06', # 450 bytes series
        b'\x00\x0e\x00\x06\x07\x0c\n\x0e\x07', # also 450 bytes
        b'\x05\x0e\x07\t\x04\x01\x05\x07\x08',
    ), # 3-byte-serie tuple
]

data_flat = np.array(data, dtype=np.bytes_).reshape(-1)
np.frombuffer(data_flat, dtype=np.uint8).reshape(2, 3, 10)

I hope this helps!

Answered By - Axel Donath

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, December 21, 2023

[FIXED] A numpy function convert list(tuple) of byte seires (with equal length) to numpy.ndarray

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels