Issue
Given, two 3-D arrays of dimensions (2,2,2):
A = [[[ 0, 0],
[92, 92]],
[[ 0, 92],
[ 0, 92]]]
B = [[[ 0, 0],
[92, 0]],
[[ 0, 92],
[92, 92]]]
How do you find the Euclidean distance for each vector in A and B efficiently?
I have tried for-loops but these are slow, and I'm working with 3-D arrays in the order of (>>2, >>2, 2).
Ultimately I want a matrix of the form:
C = [[d1, d2],
[d3, d4]]
Edit:
I've tried the following loop, but the biggest issue with it is that loses the dimensions I want to keep. But the distances are correct.
[numpy.sqrt((A[row, col][0] - B[row, col][0])**2 + (B[row, col][1] -A[row, col][1])**2) for row in range(2) for col in range(2)]
Solution
Thinking in a NumPy vectorized way that would be performing element-wise differentiation, squaring and summing along the last axis and finally getting square root. So, the straight-forward implementation would be -
np.sqrt(((A - B)**2).sum(-1))
We could perform the squaring and summing along the last axis in one go with np.einsum
and thus make it more efficient, like so -
subs = A - B
out = np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))
Another alternative with numexpr
module -
import numexpr as ne
np.sqrt(ne.evaluate('sum((A-B)**2,2)'))
Since, we are working with a length of 2
along the last axis, we could just slice those and feed it to evaluate
method. Please note that slicing isn't possible inside the evaluate string. So, the modified implementation would be -
a0 = A[...,0]
a1 = A[...,1]
b0 = B[...,0]
b1 = B[...,1]
out = ne.evaluate('sqrt((a0-b0)**2 + (a1-b1)**2)')
Runtime test
Function definitions -
def sqrt_sum_sq_based(A,B):
return np.sqrt(((A - B)**2).sum(-1))
def einsum_based(A,B):
subs = A - B
return np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))
def numexpr_based(A,B):
return np.sqrt(ne.evaluate('sum((A-B)**2,2)'))
def numexpr_based_with_slicing(A,B):
a0 = A[...,0]
a1 = A[...,1]
b0 = B[...,0]
b1 = B[...,1]
return ne.evaluate('sqrt((a0-b0)**2 + (a1-b1)**2)')
Timings -
In [288]: # Setup input arrays
...: dim = 2
...: N = 1000
...: A = np.random.rand(N,N,dim)
...: B = np.random.rand(N,N,dim)
...:
In [289]: %timeit sqrt_sum_sq_based(A,B)
10 loops, best of 3: 40.9 ms per loop
In [290]: %timeit einsum_based(A,B)
10 loops, best of 3: 22.9 ms per loop
In [291]: %timeit numexpr_based(A,B)
10 loops, best of 3: 18.7 ms per loop
In [292]: %timeit numexpr_based_with_slicing(A,B)
100 loops, best of 3: 8.23 ms per loop
In [293]: %timeit np.linalg.norm(A-B, axis=-1) #@dnalow's soln
10 loops, best of 3: 45 ms per loop
Answered By - Divakar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.