Saturday, September 3, 2022

[FIXED] Filter two List[List] given a filter List[List] keeping list in order

September 03, 2022 numpy, python No comments

Issue

So I have two List[List] X,Y that I need to filter given a List[List] Z. The values in Y correspond to the scores of elements in X. I need to check if values of X belong to the filter in Z and keep the scores in Y corresponding to those values.

I will illustrate with an example and my current solution.

# Base matrixes

X = [[2,1,3,4],
     [1,3,2,4],
     [1,2,3,4]]

Y = [[0.2,0.1,0.9,1.0],
     [0.3,0.2,0.4,0.2],
     [0.8,0.6,0.5,0.2]]

Z = [[1,2,3,4],
     [2,3],
     [1]]

# Expected results

new_x = [[2,1,3,4],
         [3,2],
         [1]]

new_y = [[0.2,0.1,0.9,1.0],
         [0.2,0.4],
         [0.8]]


# Current solution
def find_idx(a,b):
    r = []
    for idx, sub_a in enumerate(a):
        if sub_a in b:
            r+=[idx]
    return r

def filter(X, Y, Z):
    X = np.asarray(X)
    Y = np.asarray(Y)
    Z = np.asarray(Z)
    
    assert len(X)==len(Y)==len(Z)    
    r_x = []
    r_y = []
    for idx, sub_filter in enumerate(Z):
        x = find_idx(X[idx], Z[idx])
        r_x.append(X[idx][x].tolist())
        r_y.append(Y[idx][x].tolist())
    return r_x, r_y
        

r_x, r_y = filter(X,Y,Z)

I figured out I could easily do this with a collection of list comprehensions, but performance is important for this function.

Is there any way to speed up the part where i find the indexes of values of X that are in Z to later filter X,Y by them?

Solution

This is a more efficient way to do that when the input matrices are big:

X = np.array([
    [2, 1, 3, 4],
    [1, 3, 2, 4],
    [1, 2, 3, 4],
])

Y = np.array([
    [0.2, 0.1, 0.9, 1.0],
    [0.3, 0.2, 0.4, 0.2],
    [0.8, 0.6, 0.5, 0.2],
])

Z = [[1, 2, 3, 4],
     [2, 3],
     [1],
     ]

mask = np.zeros(X.shape)

new_x = []
new_y = []
for i, z_row in enumerate(Z):
    mask = np.isin(X[i], z_row)
    new_x.append(X[i][mask].tolist())
    new_y.append(Y[i][mask].tolist())

It was roughly 10x faster than the list comprehension when I tested it with 5000x5000 matrices. This is because the list comprehension has to loop over all elements of the list z when using the in operator.

Answered By - Michal Racko

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, September 3, 2022

[FIXED] Filter two List[List] given a filter List[List] keeping list in order

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels