Tuesday, December 19, 2023

[FIXED] How to sort vector to have minimal distance to another vector efficiently?

December 19, 2023 numpy, python No comments

Issue

I have two vectors (v1, v2). The values in the vector v2 need to be sorted so that each of them can be identified with a element in v1. The values in v1 and v2 differ slightly but might be switched. In the end I need to do this multiple times, since I need to sort the next v3 according to the sorted v2 and so on.

I thought about looking at the different permutations of v2 for every possible ordering of the values in v2. The minimal sum of the differences to v1 should be the sorting that I want to have. This works in principle but does scale terribly when v1 and v2 get bigger.

This code shows the ordering for a pair of v1, v2.

import numpy as np
import itertools

def sort(v1,v2):
    
    arr_permutations = np.array(list(itertools.permutations(v2)))

    sum_diff = np.sum(np.abs(arr_permutations - v1), axis=1)

    best_permut = arr_permutations[np.argmin(sum_diff)]

    return best_permut 

v1 = np.array([-0.99418 -0.106364j, -1.005974-0.099054j,
 -0.991923-0.107482j, -0.990868-0.107976j, -0.990558-0.108118j,
 -0.898555+0.035351j])

v2 = np.array([-1.0052  -0.10133j,  -0.993598-0.108516j,
  0.991379-0.109617j, -0.990341-0.110104j, -0.990036-0.110244j, 
 -0.898624+0.032346j])

sort(v1,v2)

Out:  np.array([-0.993598-0.108516j, -1.0052  -0.10133j, 
                -0.990341-0.110104j, -0.990036-0.110244j, 
                 0.991379-0.109617j, -0.898624+0.032346j])

In this case the correct ordering is to exchange v2[0] and v2[1] in this specific case. As the values that belong to each other all change a little bit, just looking at one value and finding the individual position closest to the value in v1 is not enough.

Solution

Here is my try:

import numpy as np
from scipy import optimize

def match(v1, v2, dist):
    assert v1.ndim == v2.ndim == 1
    assert v1.shape[0] == v2.shape[0]
    n = v1.shape[0]
    t = np.dtype(dist(v1[0], v2[0]))
    dist_matrix = np.fromiter((dist(x1, x2) for x1 in v1 for x2 in v2),
                              dtype=t, count=n*n).reshape(n, n)
    row_ind, col_ind = optimize.linear_sum_assignment(dist_matrix)
    return v2[col_ind]

v1 = np.array([-0.99418 -0.106364j, -1.005974-0.099054j, -0.991923-0.107482j,
               -0.990868-0.107976j, -0.990558-0.108118j, -0.898555+0.035351j])
v2 = np.array([-1.0052  -0.10133j,  -0.993598-0.108516j,  0.991379-0.109617j,
               -0.990341-0.110104j, -0.990036-0.110244j, -0.898624+0.032346j])

v2_matched = match(v1, v2, lambda x1, x2: abs(x1 - x2))
print(repr(v2_matched))
# =>
# array([-0.993598-0.108516j, -1.0052  -0.10133j , -0.990341-0.110104j,
#        -0.990036-0.110244j,  0.991379-0.109617j, -0.898624+0.032346j])

The output is the same as that of your sort().

As you see, you can plug in a different lambda or function for the distance.

I'm no expert of numpy, there might be a shortcut for calculating the distance matrix dist_matrix.

Thanks to @Jonas for identifying the “assignment problem”.

Answered By - Walter Tross

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, December 19, 2023

[FIXED] How to sort vector to have minimal distance to another vector efficiently?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels