Issue
So I have two List[List] X,Y that I need to filter given a List[List] Z. The values in Y correspond to the scores of elements in X. I need to check if values of X belong to the filter in Z and keep the scores in Y corresponding to those values.
I will illustrate with an example and my current solution.
# Base matrixes
X = [[2,1,3,4],
[1,3,2,4],
[1,2,3,4]]
Y = [[0.2,0.1,0.9,1.0],
[0.3,0.2,0.4,0.2],
[0.8,0.6,0.5,0.2]]
Z = [[1,2,3,4],
[2,3],
[1]]
# Expected results
new_x = [[2,1,3,4],
[3,2],
[1]]
new_y = [[0.2,0.1,0.9,1.0],
[0.2,0.4],
[0.8]]
# Current solution
def find_idx(a,b):
r = []
for idx, sub_a in enumerate(a):
if sub_a in b:
r+=[idx]
return r
def filter(X, Y, Z):
X = np.asarray(X)
Y = np.asarray(Y)
Z = np.asarray(Z)
assert len(X)==len(Y)==len(Z)
r_x = []
r_y = []
for idx, sub_filter in enumerate(Z):
x = find_idx(X[idx], Z[idx])
r_x.append(X[idx][x].tolist())
r_y.append(Y[idx][x].tolist())
return r_x, r_y
r_x, r_y = filter(X,Y,Z)
I figured out I could easily do this with a collection of list comprehensions, but performance is important for this function.
Is there any way to speed up the part where i find the indexes of values of X that are in Z to later filter X,Y by them?
Solution
This is a more efficient way to do that when the input matrices are big:
X = np.array([
[2, 1, 3, 4],
[1, 3, 2, 4],
[1, 2, 3, 4],
])
Y = np.array([
[0.2, 0.1, 0.9, 1.0],
[0.3, 0.2, 0.4, 0.2],
[0.8, 0.6, 0.5, 0.2],
])
Z = [[1, 2, 3, 4],
[2, 3],
[1],
]
mask = np.zeros(X.shape)
new_x = []
new_y = []
for i, z_row in enumerate(Z):
mask = np.isin(X[i], z_row)
new_x.append(X[i][mask].tolist())
new_y.append(Y[i][mask].tolist())
It was roughly 10x faster than the list comprehension when I tested it with 5000x5000 matrices. This is because the list comprehension has to loop over all elements of the list z
when using the in
operator.
Answered By - Michal Racko
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.