Sunday, November 12, 2023

[FIXED] Find lists of almost equal values in list

November 12, 2023 arrays, cluster-analysis, list, numpy, python No comments

Issue

I have a 1D numpy array, and I want to find the sublists/subarrays that contain almost equal values. That means they don't differ from each other more than a tolerance. I mean there is one central point in the solution from which all other points don't differ more than tolerance. For instance if I have [1.0, 2.2, 1.4, 1.8, 1.5, 2.1] and tolerance 0.2 the desired outcome is [[1.4, 1.5], [2.1, 2.2]]. The following function does the job I think:

import numpy as np


def find_almost_equal(input, tol):
    sorted = np.sort(input)
    result = []
    for i, v1 in enumerate(sorted):
        result.append([])
        for j, v2 in enumerate(sorted):
            if v2 - tol < v1 < v2 + tol:
                result[i].append(v2)

    result = [r for r in result if len(r) > 1]

    for i, r1 in enumerate(result):
        for j, r2 in enumerate(result):
            if set(r2).issubset(set(r1)):
                del result[j]

    return result


test = np.array([2.6, 1.2, 1.5, 1.8, 2.0, 2.2, 2.5, 1.1, 1.4])
tolerance = 0.15

almost_equal = find_almost_equal(test, tolerance)
print(almost_equal)

The outcome is [[1.1, 1.2], [1.4, 1.5], [2.5, 2.6]]. With tolerance = 0.25 the outcome is [[1.1, 1.2, 1.4], [1.4, 1.5], [1.8, 2.0, 2.2], [2.5, 2.6]].

When a point belongs to several sublists my algorithm does not always give the correct result. For example with input [1.0, 1.1, 1.2, 1.3, 1.4] and tolerance = 0.2 the output is [[1.0, 1.1, 1.2], [1.2, 1.3, 1.4]], while the expected outcome is [[1.0, 1.1, 1.2], [1.1, 1.2, 1.3], [1.2, 1.3, 1.4]].

The question: Is there an easier way to do this (preferably in numpy)? And how can I do this correctly?

Solution

A slightly more concise way of doing this, using a bit more NumPy functionality, is to make a array of differences (as shown in this answer), like:

def find_almost_equal(inp, tol):
    # create array of differences
    inpa = np.sort(inp)
    diff = np.abs(np.subtract.outer(inpa, inpa))
    
    # get precision of float type
    prec = np.finfo(inpa.dtype).eps * 10
    
    # loop over rows in diff array (except first and last)
    l = []
    for row in diff[1:-1]:
         # get values within tolerance (accounting for floating point precision)
         r = inpa[row + prec  < tol].tolist()
         if len(r) > 1:
             for i, prev in enumerate(l):
                 # make sure list isn't subset of previous lists
                 if set(r).issubset(prev):
                     break
                 elif set(prev).issubset(r):
                     # add in longer lists
                     del l[i]
                     l.append(r)
                     break
             else:
                 l.append(r)
    return l

This gives:

find_almost_equal([2.6, 1.2, 1.5, 1.8, 2.0, 2.2, 2.5, 1.1, 1.4], 0.15)
[[1.1, 1.2], [1.4, 1.5], [2.5, 2.6]]
find_almost_equal([2.6, 1.2, 1.5, 1.8, 2.0, 2.2, 2.5, 1.1, 1.4], 0.25)
[[1.1, 1.2, 1.4], [1.2, 1.4, 1.5], [1.8, 2.0, 2.2], [2.5, 2.6]]
find_almost_equal([1.0, 1.1, 1.2, 1.3, 1.4], 0.2)
[[1.0, 1.1, 1.2], [1.1, 1.2, 1.3], [1.2, 1.3, 1.4]]

Answered By - Matt Pitkin

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 12, 2023

[FIXED] Find lists of almost equal values in list

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels