Issue
Say that I have an array of arrays
import numpy as np
z = np.array(
[
[1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
]
)
Where 1s start on the left side of each array, and 0s on the right side if any. For many applications, this is how arrays are padded so that each array is of the same length in an array of arrays.
How would I get the shortest sequence of non-zeros for such an array.
In this case, the shortest sequence is the first array, which has a length of 2.
The obvious answer is to iterate over each array and find the index of the first zero, but I feel that there's probably a method that takes more advantage of numpy's c processing.
Solution
Benchmark with a 5000×5000 array:
74.3 ms Dani
33.8 ms user19077881
2.6 ms Kelly1
1.4 ms Kelly2
My Kelly1
is an O(m+n) saddleback search from top-right to bottom-left:
def Kelly1(z):
m, n = z.shape
j = n - 1
for i in range(m):
while not z[i, j]:
j -= 1
if j < 0:
return 0
return j + 1
(Michael Szczesny said it can trivially be made ~150x faster (if I remember correctly) by using Numba. I'm not equipped to test that myself, though.)
My Kelly2
is an O(m log n) horizontal binary search, using NumPy to check whether a column is full of non-zeros:
def Kelly2(z):
m, n = z.shape
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if z[:, mid].all():
lo = mid + 1
else:
hi = mid
return lo
(Could be shorter by using bisect
with a key
, but I don't have Python 3.10 to test right now.)
Note: Dani and user19077881 return different results: The smallest number of non-zeros in any row, or the row with the fewest non-zeros. I followed Dani's lead, as that's the accepted answer. It doesn't really matter, as you can compute one result from the other very quickly (by finding the index of the first zero in the column or row, respectively).
Full benchmark code (Try it online!):
import numpy as np
from timeit import timeit
import random
m, n = 5000, 5000
def genz():
lo = random.randrange(n*5//100, n//3)
return np.array(
[
[1]*ones + [0]*(n-ones)
for ones in random.choices(range(lo, n+1), k=m)
]
)
def Dani(z):
return np.count_nonzero(z, axis=1).min()
def user19077881(z):
z_sums = z.sum(axis = 1)
z_least = np.argmin(z_sums)
return z_least
def Kelly1(z):
m, n = z.shape
j = n - 1
for i in range(m):
while not z[i, j]:
j -= 1
if j < 0:
return 0
return j + 1
def Kelly2(z):
m, n = z.shape
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if z[:, mid].all():
lo = mid + 1
else:
hi = mid
return lo
funcs = Dani, user19077881, Kelly1, Kelly2
for _ in range(3):
z = genz()
for f in funcs:
t = timeit(lambda: f(z), number=1)
print('%5.1f ms ' % (t * 1e3), f.__name__)
print()
Answered By - Kelly Bundy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.