Issue
The question
How to efficiently slice a multidimensional np.array
using a second np.array
of similar dimensions, containing start and stop indices?
Minimal working example
I have a 4d array of values I would like to slice e.g.:
shape = np.array([[3,3], [5,5]])
arr = np.arange(np.prod(shape), dtype=np.uint8).reshape(shape.flatten())
Which is in this case all integers between 0 and 225 in a (3,3,5,5) matrix.
Additionally, I have a list of indices:
idxs = np.broadcast_to(np.array([[1,1], [4,3]]), (*shape[0],2,2))
rand = np.random.randint(-1, 2, shape[0])
idxs = idxs + np.broadcast_to(rand[:,:,None,None], (*shape[0],2,2))
This is a 4d array of 2d coordinates:
np.array([[r1, c1], [r2,c2]])
where [r1, r2]
and [c1,c2]
indicate the rows and columns I want to slice between, respectively. Even though some randomness is applied, I can guarantee that all elements of idxs
span a slice of equal dimensions (a 3x2 matrix in this case). Alternatively, the shape can be calculated using:
extent = np.min(idxs[:,:,1] - idxs[:,:,0], axis=(0,1))
Simple Python implementation
This function achieves what I want but does so with a slow, regular Python for
loop.
def sliceMultiDim(arr: np.ndarray, idxs: np.ndarray, extent: np.ndarray) -> np.ndarray:
sliced = np.zeros((*arr.shape[:2], *extent))
for row in range(arr.shape[0]):
for col in range(arr.shape[1]):
sub = arr[row,col]
idx = idxs[row,col]
sliced[row,col] = sub[idx[0,0] : idx[1,0], idx[0,1] : idx[1,1]]
return sliced
Benchmark
Defining the vectorized solution.
def sliceMultiDimVect(arr: np.ndarray, idxs: np.ndarray) -> np.ndarray:
# Calculating the minimum extent of the sliding window.
extent = np.min(idxs[:,:,1] - idxs[:,:,0], axis=(0,1))
# Create a six dimensional view of (in this case) all shape `(3, 2)` windows.
windowed = swv(arr, tuple(extent), axis=(2, 3))
# The top right corners of each slice.
# These index the required slices directly in `windowed`.
r1, c1 = idxs[:, :, 0, 0], idxs[:, :, 0, 1]
# Advanced index grid for the first two dimensions.
h, w = np.indices(arr.shape[:2], sparse=True)
return windowed[h, w, r1, c1]
When bumping up the size of the problem, the vectorized solution is about 8x faster* than my naive implementation, according to the following simple benchmarking script.
import timeit
# Setup 100 million values in 100 arrays.
shape = np.array([[10,10], [10000,10000]])
arr = np.arange(np.prod(shape), dtype=np.uint8).reshape(shape.flatten())
# Slicing 1 million values.
idxs = np.broadcast_to(np.array([[0,0], shape[1]//10]), (*shape[0],2,2))
rand = np.random.randint(0, shape[1,0]//10, shape[0])
idxs = idxs + np.broadcast_to(rand[:,:,None,None], (*shape[0],2,2))
def non_vect():
sliceMultiDim(arr, idxs)
def vect():
sliceMultiDimVect(arr, idxs)
print("non_vect:", timeit.timeit("non_vect()", setup="from __main__ import non_vect", number=100))
print("vect:", timeit.timeit("vect()", setup="from __main__ import vect", number=100))
Solution | Time [s] | Relative time |
---|---|---|
vect | 8.430499400012195 | 1 |
non_vect | 64.53972959998646 | ~7.7 |
* Run on an Intel Core i7-1185G7 @ 3.00 GHz machine.
Solution
There is no built in function for this, because nothing requires the indices in your idxs
array to span an equal shape across all slices of the original array. If there were different shapes, then the function would produce a "ragged" array, which is not allowed.
The fact that you can guarantee that each slice will be of shape extent
allows us some trickery! In short: create a view of all extent
shape windows of your array arr
, and index into that array with arrays of r1
and c
.
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as swv
# the original setup
shape = np.array([[3,3], [5,5]])
arr = np.arange(np.prod(shape), dtype=np.uint8).reshape(shape.flatten())
idxs = np.broadcast_to(np.array([[1,1], [4,3]]), (*shape[0],2,2))
rand = np.random.randint(-1, 2, shape[0])
idxs = idxs + np.broadcast_to(rand[:,:,None,None], (*shape[0],2,2))
extent = np.min(idxs[:,:,1] - idxs[:,:,0], axis=(0,1))
# Create a six dimensional view of (in this case) all shape `(3, 2)` windows.
windowed = swv(arr, tuple(extent), axis=(2, 3))
# The top right corners of each slice.
# These index the required slices directly in `windowed`.
r1, c1 = idxs[:, :, 0, 0], idxs[:, :, 0, 1]
# Advanced index grid for the first two dimensions.
h, w = np.indices(arr.shape[:2], sparse=True)
out = windowed[h, w, r1, c1]
As a one liner:
out = swv(arr, tuple(extent), axis=(2, 3))[*np.indices(arr.shape[:2], sparse=True), idxs[:, :, 0, 0], idxs[:, :, 0, 1]]
Answered By - Chrysophylaxs
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.