Issue
For each sublist in array b
, return values from list a
with same position as positive boolean in b
sublist (i.e. where True).
import pandas as pd
import numpy as np
a = pd.Series([1, 3, 5, 7, 9]) # values to choose from
b = np.array([[False, True, False, True, False], # based on bools
[False, False, False, False, False]])
out = []
for i, v in enumerate(b):
out.append([])
for j in range(len(e)):
if v[j]:
out[i].append(a[j])
out = np.array(out) # np.array([[3,7],[]]) # result
# In first sublist, True is on index 1 and 3 which corresponds to values 3 and 7.
# In second sublist, there is not True, hence empty.
The above seems too laborious and it is possibly not making use of numpy vectorization (it is slow on large data).
Solution
Your Series
is 1d; b
is a 2d array. The Series
also has row indices, which a plain array does not.
In [70]: a.shape, b.shape
Out[70]: ((5,), (2, 5))
In [71]: a
Out[71]:
0 1
1 3
2 5
3 7
4 9
dtype: int64
We can use rows of b
, 1d array of shape (5,) to select elements from a
:
In [72]: a[b[0,:]]
Out[72]:
1 3
3 7
dtype: int64
In [73]: a[b[1,:]]
Out[73]: Series([], dtype: int64)
Since the rows produce different length results, we can't do that selection in one step. a[b]
gives an error, with the mismatch between (5,) and (2,).
It may be simpler to work with the array version of a
, also 1d, but without row indices:
In [103]: A = a.to_numpy(); A
Out[103]: array([1, 3, 5, 7, 9], dtype=int64)
Applying a row of b
to index that:
In [104]: A[b[0]]
Out[104]: array([3, 7], dtype=int64)
And iteratively doing that for all rows:
In [105]: [A[row] for row in b]
Out[105]: [array([3, 7], dtype=int64), array([], dtype=int64)]
We can make a (2,5) array from A
, and apply the b
boolean mask - but the result will be 1d, with no indication that the 2nd row did not select anything:
In [106]: np.vstack((A,A))
Out[106]:
array([[1, 3, 5, 7, 9],
[1, 3, 5, 7, 9]], dtype=int64)
In [107]: np.vstack((A,A))[b]
Out[107]: array([3, 7], dtype=int64)
Indexing with a row of b
or b
itself is what I was calling a 'whole-array' operation. But using the rows of b
individually can't be done that way; it requires a Python level iteration.
There are some other ways of working with A
and b
:
Multiplication works, where b
is treated as an array of 0 and 1s:
In [111]: A*b
Out[111]:
array([[0, 3, 0, 7, 0],
[0, 0, 0, 0, 0]], dtype=int64)
There's is also a masked array
subclass of arrays:
In [112]: np.ma.masked_array(np.vstack((A,A)),~b)
Out[112]:
masked_array(
data=[[--, 3, --, 7, --],
[--, --, --, --, --]],
mask=[[ True, False, True, False, True],
[ True, True, True, True, True]],
fill_value=999999,
dtype=int64)
The [105] list of arrays can turned into an object
dtype array:
In [115]: np.array([A[row] for row in b],object)
Out[115]: array([array([3, 7], dtype=int64), array([], dtype=int64)], dtype=object)
This is 1d, with shape (2,). Sometimes its useful, but performance wise it is not an improvement over the list.
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.