Issue
I have a continuous stream of values coming in, many millions of records. I need to find the minimum and maximum in that has arrived so far in real-time as numbers keep coming in. The whole data array is not available. Data arrived is not stored. Min max range is also unknown.
I tried something like this, but it isn't working perfectly. Is there a better way to solve these using libraries, numpy
, scipy
?
import numpy as np
rng = np.random.default_rng()
test = rng.choice(np.arange(-100,100, dtype=int), 10, replace=False)
testmax = 0
testmin = 0
for i in test: #simulates a stream
if i < testmax:
testmin = i
if i > testmax:
testmax = i
if i < testmin:
testmin = i
print (test, 'min: ',testmin, 'max: ', testmax)
>>> print (test, 'min: ',testmin, 'max: ', testmax)
[ 39 -32 61 -18 -53 -57 -69 98 -88 -47] min: -47 max: 98 #should be -88 and 98
>>>
>>> print (test, 'min: ',testmin, 'max: ', testmax)
[-65 -53 1 2 26 -62 82 70 39 -44] min: -44 max: 82 #should be -65 and 82
>>>
Solution
The mistake (typo) was pointed out in the comments, but you only need two comparisons--this can be done using the ternary operator. You should also initialize the max to be negative infinity and the min to be positive infinity. This helps avoid cases where, for example, you set min to 0 but the smallest number actually seen in the stream is greater than 0.
import numpy as np
rng = np.random.default_rng(42)
stream_min = -100
stream_max = 100
test = rng.choice(np.arange(stream_min, stream_max+1, dtype=int),
10,
replace=False)
testmax = -float("inf")
testmin = float("inf")
# simulates a stream
for i in test:
testmax = i if i > testmax else testmax
testmin = i if i < testmin else testmin
print (test, "min: ", testmin, "max: ", testmax)
# [ 97 49 -83 26 -15 -16 38 -82 -60 69] min: -83 max: 97
Why the ternary operator over using min
/max
? Well, it's faster.
stream_min = -1000
stream_max = 1000
test = rng.choice(np.arange(stream_min, stream_max+1, dtype=int),
500,
replace=False)
def ternary():
testmax = -float("inf")
testmin = float("inf")
for i in test:
testmax = i if i > testmax else testmax
testmin = i if i < testmin else testmin
return testmin, testmax
def plainif():
testmax = -float("inf")
testmin = float("inf")
for i in test:
if i > testmax:
testmax = i
if i < testmin:
testmin = i
return testmin, testmax
def minmax():
testmax = -float("inf")
testmin = float("inf")
for i in test:
testmax = max(i, testmax)
testmin = min(i, testmax)
return testmin, testmax
%timeit ternary()
55.4 µs ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit plainif()
50.6 µs ± 2.23 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit minmax()
170 µs ± 6.01 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Using an if
statement vs the ternary operator is nearly equivalent (the if
is a hair faster).
Answered By - jared
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.