Issue
I'm trying to run a loop where I develop a mask, and then use that mask to assign various values in various rows in one array with specific values from another array. The following script works, but only when there are no duplicate values in column 0 of array y. If there are duplicates, then the mask would have an assignment made to multiple rows in y, then the error throws. Thx for any help.
x = np.zeros(shape=(100,10))
x[:,0] = np.arange(100)
# this seed = 9 produces duplicate values in column 1, which seems cause the problem
# (no issues when there are no duplicate values in column 1 of y)
y = (np.random.default_rng(9).random((10,7))*100).astype(int)
for i in range(x.shape[0]):
mask = y[:,0] == x[i,0]
y[mask,[1,3,4,6]] = x[i,[1,2,3,4]]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [219], in <cell line: 2>()
2 for i in range(x.shape[0]):
3 mask = y[:,0] == x[i,0]
----> 4 y[mask,[1,3,4,6]] = x[i,[1,2,3,4]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (4,)
Solution
The mask array in your example must have at least one True
in each loop, because you are assigning to rows one by one in loops. You can use if condition
to be sure mask contains at least one true:
1. First solution: curing the prepared loop
range_ = np.arange(y.shape[0], dtype=np.int64)
for i in range(x.shape[0]):
mask = y[:, 0] == x[i, 0]
if np.count_nonzero(mask) != 0:
true_counts = np.count_nonzero(mask)
broadcast_x = np.broadcast_to(x[i, [1, 2, 3, 4]], shape=(true_counts, 4)) # 4 is length of [1, 2, 3, 4]
broadcast_y = np.broadcast_to([1, 3, 4, 6], shape=(true_counts, 4))
y[range_[mask][:, None], broadcast_y] = broadcast_x
2. Second solution: vectorized way (the best)
Instead using loops, we can firstly find the intersection and then use advanced indexing as:
mask = np.in1d(y[:, 0], x[:, 0])
y[mask, np.array([1, 3, 4, 6])[:, None]] = 0
now, if the x[:, 0]
is specified by np.arange
, for assigning an array instead of zero, for creating this array, we need to take the related values from x
. For doing so, at first, we select the corresponding rows by x[y[:, 0] - x[0, 0]]
(in your case it can be just x[y[:, 0]
because np.arange
start from 0
so x[0, 0] = 0
) and then apply the masks to bring out the needed values from specified rows and columns:
mask = np.in1d(y[:, 0], x[:, 0]) # rows mask for y
new_arr = x[y[:, 0] - x[0, 0]][mask, np.array([1, 2, 3, 4])[:, None]]
y[mask, np.array([1, 3, 4, 6])[:, None]] = new_arr
if it get error IndexError: arrays used as indices must be of integer (or boolean) type so we must ensure indices type are integers so we can use some code like (y[:, 0] - x[0, 0]).astype(np.int64)
or np.array([1, 2, 3, 4], dtype=np.int64)
.
The more comprehensive code is to find the common elements' indices between the two arrays when we didn't fill the x[:, 0]
by np.arange
. So the code will be as:
mask = np.in1d(y[:, 0], x[:, 0])
# finding common indices
unique_values, index = np.unique(x[:, 0], return_index=True)
idx = index[np.searchsorted(unique_values, y[:, 0])]
new_arr = x[idx][mask, np.array([1, 2, 3, 4])[:, None]]
y[mask, np.array([1, 3, 4, 6])[:, None]] = new_arr
3. Third solution: indexing (just for the prepared toy example)
For the prepared example in the question, you can do this easily by advanced indexing instead the loop:
y[:, [1, 3, 4, 6]] = 0
This last code is working on your prepared data because values in y
(< 100
) involved in x
first column (which is from 0
to 99
).
or in case of assigning array instead 0
:
new_arr = np.array([3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
y[:, [1, 3, 4, 6]] = new_arr[:, None]
Answered By - Ali_Sh
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.