Issue
I'm looking for an efficient NumPy implementation for the following:
My inputs are:
X = 2d NumPy array of shape (m,3) (m can be anywhere from 10 to 10 Million)
Q = 2d NumPy array with shape (n,p). Both n and p are small (<= 10)
It is easier to explain what I'm looking for using the following example.
Example:
X = np.array([[2, 56, 20], [3, 194, 18], [4, 36, 6], [5, 168, 42]]) with m = 4
Q = np.array([[3, 7, 11], [5, 13, 17]]) with n = 2 and p = 3
For each row j
of X
, I want to find all the rows of Q
that contain an entry q
such that X[j,1]%q == X[j,2]%q
. For the above example, X[0,1]%q == X[0,2]%q
for q = 3
, which is in row 0
of Q
. Similarly, X[2,1]%q == X[2,2]%q
for q = 3
and q = 5
, which are in rows 0
and 1
of Q
. The output should be list of lists/NumPy arrays with matching rows of Q
for each row of X
. For the above example, the output should be: [[0], [0], [0,1], [0]]
I have written the following code, but am wondering if there is a more efficient approach.
result = []
for i in range(len(X)):
result.append(np.where(X[i,1] % Q == X[i,2] % Q)[0])
Solution
This computes all modulo combinations at once:
P = X[:, 1:, np.newaxis, np.newaxis] % Q
The output shape will be [m, 2, n, p]. Note: If you can, change the integer size to i4 or u4 as 32 bit modulo is faster than 64 bit; or go even smaller. It also limits memory consumption which may be significant here.
Now we can do all comparisons at once:
B = P[:, 0] == P[:, 1]
The output shape will be [m, n, p]
To find the rows where a suitable q exists, we use np.any
F = np.any(B, axis=-1)
The shape will be [m, n], for each combination of X and Q, a boolean whether any q in Q matches the row in X.
To get the indices, use np.nonzero
Xs, Qs = np.nonzero(F)
for x, q in zip(Xs, Qs):
print("X index %d matches Q index %d" % (x, q))
Answered By - Homer512
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.