Saturday, November 25, 2023

[FIXED] How to find a 10x10x3 numpy array within a 1000x1000x3 numpy array (fast)?

November 25, 2023 numpy, python No comments

Issue

I am wondering the following:

I have a 10x10x3 and a 1000x1000x3 numpy array. I want to check if the smaller array can be found within the bigger array (and where). This is pretty easy with 2 for loops, but that solution is very slow:

H, W, _ = big.shape
h, w, _ = small.shape

for i in range(H - h + 1):
    for j in range(W - w + 1):
        if np.array_equal(big[i:i+h, j:j+w, :], small):
            print(i,j)

I was wondering if there is a way to vectorize this? Or a way to write this routine faster?

Solution

TL;DR: see "approach #4" below, using a second look at sliding_window_view. That's more than 640x faster than the OP's Python nested loops.

Notes:

For consistency, in all the approaches below, I am using a 1000,1000,3 array for big (even though the question has evolved from 1000 to 100).
All the solutions I give also search in the 3rd dimension (e.g. if big ~ 1000,1000,5 and small ~ 10,10,3)

Setup

def problem(n=1000):
    big = np.ones((n, n, 3))
    small = np.zeros((10, 10, 3))

    # plug small at three locations, two of them overlapping
    s0, s1, s2 = small.shape
    i0, i1, i2 = 2, 3, 0
    big[i0:i0+s0, i1:i1+s1, i2:i2+s2] = small
    
    i0, i1, i2 = 3, 4, 0
    big[i0:i0+s0, i1:i1+s1, i2:i2+s2] = small
    
    i0, i1, i2 = 10, 15, 0
    big[i0:i0+s0, i1:i1+s1, i2:i2+s2] = small

    return big, small

# for all below, we use:
big, small = problem(1000)

Approach #1: `sliding_window_view`

A way that avoids the problem I flagged in comments to the correlate2d solution is to use numpy.lib.stride_tricks.sliding_window_view:

# 2. finding the locations of the small array

from numpy.lib.stride_tricks import sliding_window_view
>>> np.c_[np.where(((sliding_window_view(big, small.shape) - small)**2).sum((-1, -2, -3)) == 0)]
array([[ 2,  3,  0],
       [ 3,  4,  0],
       [10, 15,  0]])

In terms of speed, this is only 5x faster than the OP's approach. We can surely do better.

Approach #2: convolution against a prime kernel

Edit: this idea is actually wrong; one can always get false positives.

The idea was to use convolution, but attempting to take some care to avoid false positives (e.g. if small = zero).

I had thought of using a kernel made of primes, of same size as small. Then, I thought that the convolution of small, ker was a unique value that can be obtained only if all the elements are exactly the same as small. But this is false. Even using only integers, we can easily build counter-examples:

[2, 1, 0] ✱ [1, 3, 5] == 13 == [1, 2, 2], [1, 3, 5] (where ✱ denotes convolution).

I am leaving the code below, but it should not be used.

from scipy.signal import fftconvolve


def conv_find(big, small):
    '''do not use: can give false hits'''
    n = np.prod(small.shape)
    
    # https://t5k.org/howmany.html:
    # p(n) <= n (ln n + ln ln n - 1 + 1.8 ln ln n / ln n)  if n >= 13
    l_n = np.log(n)
    N = int(np.ceil(n * (l_n + np.log(np.log(n - 1)) + 1.8 * np.log(l_n) / l_n) if n >= 13 else 53.77))
    
    ker = np.array(primesfrom2to(N)[:n]).reshape(small.shape)

    indices = np.c_[np.where(np.isclose(
        fftconvolve(big, ker, 'valid'),
        fftconvolve(small, ker, 'valid')))
    ]
    return indices

(primesfrom2to(N) is one of many functions finding primes up to N; I use the one given in this SO answer).

Example:

>>> conv_find(big, small)
array([[ 2,  3,  0],
       [ 3,  4,  0],
       [10, 15,  0]])

Now, the timing is a bit better:

>>> big.shape, small.shape
((1000, 1000, 3), (10, 10, 3))

%timeit conv_find(big, small)
437 ms ± 2.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit op_find(big, small)
3.29 s ± 2.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So now we have a 7.4x speedup, with the risk of accidental hits (which we could filter down with an actual equality check). I believe we can still do better.

Approach #3: Numba to the rescue!

from numba import njit

@njit
def _nb_find(big, small):
    H, W, L = big.shape
    h, w, l = small.shape
    ix = np.zeros((H - h + 1, W - w + 1, L - l + 1))
    x0 = small[0,0,0]
    for i in range(H - h + 1):
        for j in range(W - w + 1):
            for k in range(L - l + 1):
                if big[i, j, k] == x0 and np.array_equal(big[i:i+h, j:j+w, k:k+l], small):
                    ix[i, j, k] = 1
    return ix

def nb_find(big, small):
    return np.c_[np.nonzero(_nb_find(big, small))]

The first part of the test (big[i, j, k] == x0) avoids calling np.array_equal and gives us an additional 50x speedup.

Now, the timing is:

%timeit nb_find(big, small)
8.96 ms ± 72.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That's a 367x speedup from the op_find, and 48x faster than my second solution above (using fftconvolve).

Approach #4: `sliding_window_view` revisited

Based on the result that just checking the first element (approach #3) gives us a nice speedup, we can revisit our sliding_window_view approach, to do this with pure numpy (without need for numba).

from numpy.lib.stride_tricks import sliding_window_view

def slide_find2(big, small):
    z = sliding_window_view(big, small.shape)
    ix = np.where(z[:, :, :, 0, 0, 0] == small[0,0,0])
    return np.c_[ix][np.all(z[ix] == small, axis=(-3, -2, -1))]

Note: as per @JérômeRichard's comment, we can identify the rarest element in both small and large, and use that one instead of 0,0,0 in the function above. Right now trying to keep this simple.

The speed now is:

%timeit slide_find2(big, small)
# 5.14 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That is 640x faster than the OP's loop.

Answered By - Pierre D

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 25, 2023

[FIXED] How to find a 10x10x3 numpy array within a 1000x1000x3 numpy array (fast)?

Issue

Solution

Setup

Approach #1: `sliding_window_view`

Approach #2: convolution against a prime kernel

Approach #3: Numba to the rescue!

Approach #4: `sliding_window_view` revisited

0 comments:

Post a Comment

Popular Posts

Labels

Saturday, November 25, 2023

Issue

Solution

Setup

Approach #1: sliding_window_view

Approach #2: convolution against a prime kernel

Approach #3: Numba to the rescue!

Approach #4: sliding_window_view revisited

0 comments:

Post a Comment

Popular Posts

Labels

Approach #1: `sliding_window_view`

Approach #4: `sliding_window_view` revisited