Monday, June 27, 2022

[FIXED] Query an array where two other arrays align

June 27, 2022 arrays, numpy, pandas, python No comments

Issue

I have 3 arrays, x, y, and q. Arrays x and y have the same length, q is a query array. Assume all values in x and q are unique. For each value of q, I would like to find the index of the corresponding value in x. I would then like to query that index in y. If a value from q does not appear in x, I would like to return np.nan.

As a concrete example, consider the following arrays:

x = np.array([1, 2, 3])                                                                  
y = np.array([4, 5, 6])                                                                  
q = np.array([2, 0])

Since only the value 2 occurs in x, the correct return value would be:

out = np.array([5, np.nan])

With for loops, this can be done like so:

out = []                                                                                 
for i in range(len(q)):                                                                  
    for j in range(len(x)):                                                              
        if np.allclose(q[i], x[j]):                                                      
            out.append(y[j])                                                             
            break                                                                        
    else:                                                                                
        out.append(np.nan)                                                               
output = np.array(out)

Obviously this is quite slow. Is there a simpler way to do this with numpy builtins like np.argwhere? Or would it be easier to use pandas?

Solution

Numpy broadcasting should work.

# a mask that flags any matches
m = q == x[:, None]
# replace any value in q without any match in x by np.nan
res = np.where(m.any(0), y[:, None] * m, np.nan).sum(0)
res
# array([ 5., nan])

I should note that this only works if x has no duplicates.

Because it relies on building a len(x) x len(q) array, if q is large, the above solution will run into memory issues. Another pandas solution will work much more efficiently in that case:

# map q to y via x
res = pd.Series(q).map(pd.Series(y, index=x)).values

If x and q are 2D, it's better to convert the Series.map() solution into a DataFrame.merge() one:

res = pd.DataFrame(q).merge(pd.DataFrame(x).assign(y=y), on=[0,1], how='left')['y'].values

Numpy broadcasting will blow up (will require 3D array) and will not be efficient for large arrays. Numba might do well though.

Answered By - not a robot

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, June 27, 2022

[FIXED] Query an array where two other arrays align

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels