Issue
I am wondering the following:
I have a 10x10x3 and a 1000x1000x3 numpy array. I want to check if the smaller array can be found within the bigger array (and where). This is pretty easy with 2 for loops, but that solution is very slow:
H, W, _ = big.shape
h, w, _ = small.shape
for i in range(H - h + 1):
for j in range(W - w + 1):
if np.array_equal(big[i:i+h, j:j+w, :], small):
print(i,j)
I was wondering if there is a way to vectorize this? Or a way to write this routine faster?
Solution
TL;DR: see "approach #4" below, using a second look at sliding_window_view
. That's more than 640x faster than the OP's Python nested loops.
Notes:
- For consistency, in all the approaches below, I am using a
1000,1000,3
array forbig
(even though the question has evolved from 1000 to 100). - All the solutions I give also search in the 3rd dimension (e.g. if
big ~ 1000,1000,5
andsmall ~ 10,10,3
)
Setup
def problem(n=1000):
big = np.ones((n, n, 3))
small = np.zeros((10, 10, 3))
# plug small at three locations, two of them overlapping
s0, s1, s2 = small.shape
i0, i1, i2 = 2, 3, 0
big[i0:i0+s0, i1:i1+s1, i2:i2+s2] = small
i0, i1, i2 = 3, 4, 0
big[i0:i0+s0, i1:i1+s1, i2:i2+s2] = small
i0, i1, i2 = 10, 15, 0
big[i0:i0+s0, i1:i1+s1, i2:i2+s2] = small
return big, small
# for all below, we use:
big, small = problem(1000)
Approach #1: sliding_window_view
A way that avoids the problem I flagged in comments to the correlate2d
solution is to use numpy.lib.stride_tricks.sliding_window_view
:
# 2. finding the locations of the small array
from numpy.lib.stride_tricks import sliding_window_view
>>> np.c_[np.where(((sliding_window_view(big, small.shape) - small)**2).sum((-1, -2, -3)) == 0)]
array([[ 2, 3, 0],
[ 3, 4, 0],
[10, 15, 0]])
In terms of speed, this is only 5x faster than the OP's approach. We can surely do better.
Approach #2: convolution against a prime kernel
Edit: this idea is actually wrong; one can always get false positives.
The idea was to use convolution, but attempting to take some care to avoid false positives (e.g. if small = zero
).
I had thought of using a kernel made of primes, of same size as small
. Then, I thought that the convolution of small, ker
was a unique value that can be obtained only if all the elements are exactly the same as small
. But this is false. Even using only integers, we can easily build counter-examples:
[2, 1, 0] ✱ [1, 3, 5] == 13 == [1, 2, 2], [1, 3, 5]
(where ✱
denotes convolution).
I am leaving the code below, but it should not be used.
from scipy.signal import fftconvolve
def conv_find(big, small):
'''do not use: can give false hits'''
n = np.prod(small.shape)
# https://t5k.org/howmany.html:
# p(n) <= n (ln n + ln ln n - 1 + 1.8 ln ln n / ln n) if n >= 13
l_n = np.log(n)
N = int(np.ceil(n * (l_n + np.log(np.log(n - 1)) + 1.8 * np.log(l_n) / l_n) if n >= 13 else 53.77))
ker = np.array(primesfrom2to(N)[:n]).reshape(small.shape)
indices = np.c_[np.where(np.isclose(
fftconvolve(big, ker, 'valid'),
fftconvolve(small, ker, 'valid')))
]
return indices
(primesfrom2to(N)
is one of many functions finding primes up to N
; I use the one given in this SO answer).
Example:
>>> conv_find(big, small)
array([[ 2, 3, 0],
[ 3, 4, 0],
[10, 15, 0]])
Now, the timing is a bit better:
>>> big.shape, small.shape
((1000, 1000, 3), (10, 10, 3))
%timeit conv_find(big, small)
437 ms ± 2.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit op_find(big, small)
3.29 s ± 2.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
So now we have a 7.4x speedup, with the risk of accidental hits (which we could filter down with an actual equality check). I believe we can still do better.
Approach #3: Numba to the rescue!
from numba import njit
@njit
def _nb_find(big, small):
H, W, L = big.shape
h, w, l = small.shape
ix = np.zeros((H - h + 1, W - w + 1, L - l + 1))
x0 = small[0,0,0]
for i in range(H - h + 1):
for j in range(W - w + 1):
for k in range(L - l + 1):
if big[i, j, k] == x0 and np.array_equal(big[i:i+h, j:j+w, k:k+l], small):
ix[i, j, k] = 1
return ix
def nb_find(big, small):
return np.c_[np.nonzero(_nb_find(big, small))]
The first part of the test (big[i, j, k] == x0
) avoids calling np.array_equal
and gives us an additional 50x speedup.
Now, the timing is:
%timeit nb_find(big, small)
8.96 ms ± 72.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That's a 367x speedup from the op_find
, and 48x faster than my second solution above (using fftconvolve
).
Approach #4: sliding_window_view
revisited
Based on the result that just checking the first element (approach #3) gives us a nice speedup, we can revisit our sliding_window_view
approach, to do this with pure numpy
(without need for numba
).
from numpy.lib.stride_tricks import sliding_window_view
def slide_find2(big, small):
z = sliding_window_view(big, small.shape)
ix = np.where(z[:, :, :, 0, 0, 0] == small[0,0,0])
return np.c_[ix][np.all(z[ix] == small, axis=(-3, -2, -1))]
Note: as per @JérômeRichard's comment, we can identify the rarest element in both small
and large
, and use that one instead of 0,0,0
in the function above. Right now trying to keep this simple.
The speed now is:
%timeit slide_find2(big, small)
# 5.14 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
That is 640x faster than the OP's loop.
Answered By - Pierre D
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.