Monday, January 29, 2024

[FIXED] Slicing NumPy array with array of start and end indices

January 29, 2024 arrays, numpy, python No comments

Issue

The question

How to efficiently slice a multidimensional np.array using a second np.array of similar dimensions, containing start and stop indices?

Minimal working example

I have a 4d array of values I would like to slice e.g.:

shape = np.array([[3,3], [5,5]])
arr = np.arange(np.prod(shape), dtype=np.uint8).reshape(shape.flatten())

Which is in this case all integers between 0 and 225 in a (3,3,5,5) matrix.

Additionally, I have a list of indices:

idxs = np.broadcast_to(np.array([[1,1], [4,3]]), (*shape[0],2,2))
rand = np.random.randint(-1, 2, shape[0])
idxs = idxs + np.broadcast_to(rand[:,:,None,None], (*shape[0],2,2))

This is a 4d array of 2d coordinates:

np.array([[r1, c1], [r2,c2]])

where [r1, r2] and [c1,c2] indicate the rows and columns I want to slice between, respectively. Even though some randomness is applied, I can guarantee that all elements of idxs span a slice of equal dimensions (a 3x2 matrix in this case). Alternatively, the shape can be calculated using:

extent = np.min(idxs[:,:,1] - idxs[:,:,0], axis=(0,1))

Simple Python implementation

This function achieves what I want but does so with a slow, regular Python for loop.

def sliceMultiDim(arr: np.ndarray, idxs: np.ndarray, extent: np.ndarray) -> np.ndarray:
    sliced = np.zeros((*arr.shape[:2], *extent))        
    for row in range(arr.shape[0]):
        for col in range(arr.shape[1]):
            sub = arr[row,col]
            idx = idxs[row,col]
            sliced[row,col] = sub[idx[0,0] : idx[1,0], idx[0,1] : idx[1,1]]
    return sliced

Benchmark

Defining the vectorized solution.

def sliceMultiDimVect(arr: np.ndarray, idxs: np.ndarray) -> np.ndarray:
    # Calculating the minimum extent of the sliding window.
    extent = np.min(idxs[:,:,1] - idxs[:,:,0], axis=(0,1))
    # Create a six dimensional view of (in this case) all shape `(3, 2)` windows.
    windowed = swv(arr, tuple(extent), axis=(2, 3))
    # The top right corners of each slice.
    # These index the required slices directly in `windowed`.
    r1, c1 = idxs[:, :, 0, 0], idxs[:, :, 0, 1]
    # Advanced index grid for the first two dimensions.
    h, w = np.indices(arr.shape[:2], sparse=True)
    return windowed[h, w, r1, c1]

When bumping up the size of the problem, the vectorized solution is about 8x faster* than my naive implementation, according to the following simple benchmarking script.

import timeit
# Setup 100 million values in 100 arrays.
shape = np.array([[10,10], [10000,10000]])
arr = np.arange(np.prod(shape), dtype=np.uint8).reshape(shape.flatten())
# Slicing 1 million values.
idxs = np.broadcast_to(np.array([[0,0], shape[1]//10]), (*shape[0],2,2))
rand = np.random.randint(0, shape[1,0]//10, shape[0])
idxs = idxs + np.broadcast_to(rand[:,:,None,None], (*shape[0],2,2))
def non_vect():
    sliceMultiDim(arr, idxs)
def vect():
    sliceMultiDimVect(arr, idxs)
print("non_vect:", timeit.timeit("non_vect()", setup="from __main__ import non_vect", number=100))
print("vect:", timeit.timeit("vect()", setup="from __main__ import vect", number=100))

Solution	Time [s]	Relative time
vect	8.430499400012195	1
non_vect	64.53972959998646	~7.7

* Run on an Intel Core i7-1185G7 @ 3.00 GHz machine.

Solution

There is no built in function for this, because nothing requires the indices in your idxs array to span an equal shape across all slices of the original array. If there were different shapes, then the function would produce a "ragged" array, which is not allowed.

The fact that you can guarantee that each slice will be of shape extent allows us some trickery! In short: create a view of all extent shape windows of your array arr, and index into that array with arrays of r1 and c.

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as swv

# the original setup
shape = np.array([[3,3], [5,5]])
arr = np.arange(np.prod(shape), dtype=np.uint8).reshape(shape.flatten())

idxs = np.broadcast_to(np.array([[1,1], [4,3]]), (*shape[0],2,2))
rand = np.random.randint(-1, 2, shape[0])
idxs = idxs + np.broadcast_to(rand[:,:,None,None], (*shape[0],2,2))

extent = np.min(idxs[:,:,1] - idxs[:,:,0], axis=(0,1))

# Create a six dimensional view of (in this case) all shape `(3, 2)` windows.
windowed = swv(arr, tuple(extent), axis=(2, 3))

# The top right corners of each slice.
# These index the required slices directly in `windowed`.
r1, c1 = idxs[:, :, 0, 0], idxs[:, :, 0, 1]

# Advanced index grid for the first two dimensions.
h, w = np.indices(arr.shape[:2], sparse=True)

out = windowed[h, w, r1, c1]

As a one liner:

out = swv(arr, tuple(extent), axis=(2, 3))[*np.indices(arr.shape[:2], sparse=True), idxs[:, :, 0, 0], idxs[:, :, 0, 1]]

Answered By - Chrysophylaxs

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, January 29, 2024

[FIXED] Slicing NumPy array with array of start and end indices

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels