Issue
I have a 1D numpy array, and I want to find the sublists/subarrays that contain almost equal values. That means they don't differ from each other more than a tolerance. I mean there is one central point in the solution from which all other points don't differ more than tolerance. For instance if I have [1.0, 2.2, 1.4, 1.8, 1.5, 2.1]
and tolerance 0.2
the desired outcome is [[1.4, 1.5], [2.1, 2.2]]
. The following function does the job I think:
import numpy as np
def find_almost_equal(input, tol):
sorted = np.sort(input)
result = []
for i, v1 in enumerate(sorted):
result.append([])
for j, v2 in enumerate(sorted):
if v2 - tol < v1 < v2 + tol:
result[i].append(v2)
result = [r for r in result if len(r) > 1]
for i, r1 in enumerate(result):
for j, r2 in enumerate(result):
if set(r2).issubset(set(r1)):
del result[j]
return result
test = np.array([2.6, 1.2, 1.5, 1.8, 2.0, 2.2, 2.5, 1.1, 1.4])
tolerance = 0.15
almost_equal = find_almost_equal(test, tolerance)
print(almost_equal)
The outcome is [[1.1, 1.2], [1.4, 1.5], [2.5, 2.6]]
. With tolerance = 0.25
the outcome is [[1.1, 1.2, 1.4], [1.4, 1.5], [1.8, 2.0, 2.2], [2.5, 2.6]]
.
When a point belongs to several sublists my algorithm does not always give the correct result. For example with input [1.0, 1.1, 1.2, 1.3, 1.4]
and tolerance = 0.2
the output is [[1.0, 1.1, 1.2], [1.2, 1.3, 1.4]]
, while the expected outcome is [[1.0, 1.1, 1.2], [1.1, 1.2, 1.3], [1.2, 1.3, 1.4]]
.
The question: Is there an easier way to do this (preferably in numpy)? And how can I do this correctly?
Solution
A slightly more concise way of doing this, using a bit more NumPy functionality, is to make a array of differences (as shown in this answer), like:
def find_almost_equal(inp, tol):
# create array of differences
inpa = np.sort(inp)
diff = np.abs(np.subtract.outer(inpa, inpa))
# get precision of float type
prec = np.finfo(inpa.dtype).eps * 10
# loop over rows in diff array (except first and last)
l = []
for row in diff[1:-1]:
# get values within tolerance (accounting for floating point precision)
r = inpa[row + prec < tol].tolist()
if len(r) > 1:
for i, prev in enumerate(l):
# make sure list isn't subset of previous lists
if set(r).issubset(prev):
break
elif set(prev).issubset(r):
# add in longer lists
del l[i]
l.append(r)
break
else:
l.append(r)
return l
This gives:
find_almost_equal([2.6, 1.2, 1.5, 1.8, 2.0, 2.2, 2.5, 1.1, 1.4], 0.15)
[[1.1, 1.2], [1.4, 1.5], [2.5, 2.6]]
find_almost_equal([2.6, 1.2, 1.5, 1.8, 2.0, 2.2, 2.5, 1.1, 1.4], 0.25)
[[1.1, 1.2, 1.4], [1.2, 1.4, 1.5], [1.8, 2.0, 2.2], [2.5, 2.6]]
find_almost_equal([1.0, 1.1, 1.2, 1.3, 1.4], 0.2)
[[1.0, 1.1, 1.2], [1.1, 1.2, 1.3], [1.2, 1.3, 1.4]]
Answered By - Matt Pitkin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.