Issue
I am using a numpy solution to perform a complete sampling without replacement, according to a list of weights, and doing this N times. So for this example below, I want to sample from the numbers 0-3 without replacement, sampling all numbers, and repeating that process 10 times. Here is what I've done so far:
np.random.seed(seed=123);
N = 10;
samples = []
P = [0.5,0.3,0.1,0.1]
for i in np.arange(N):
picks = np.random.choice(4,size=4,replace=False, p=P)
samples.append(picks)
samples
It produces:
[array([1, 0, 3, 2]),
array([3, 1, 0, 2]),
array([1, 0, 3, 2]),
array([0, 1, 3, 2]),
array([1, 0, 2, 3]),
array([0, 1, 3, 2]),
array([1, 0, 3, 2]),
array([0, 3, 1, 2]),
array([2, 1, 0, 3]),
array([0, 1, 2, 3])]
Now, for example, I'd like to determine how many times does the number 0 appear in the first position via code? How many times does 1 appear in the first position? Ideally, I'd like the full distribution across the four positions, e.g. I know that 0 appears twice in the third position, 1 appears once in the third position, 2 appears twice in the third position, 3 appears five times in the third position, etc. across all positions.
Solution
You can use:
# make real 2D array
arr = np.vstack(samples)
# get unique values
u = np.unique(arr)
# array([0, 1, 2, 3])
# broadcast and count
out = (arr[:,None] == u[:,None]).sum(axis=0)
output:
# col 0 1 2 3
array([[4, 4, 2, 0], # value: 0
[4, 5, 1, 0], # value: 1
[1, 0, 2, 7], # value: 2
[1, 1, 5, 3]]) # value: 3
NB. This consumes a lot of memory on large inputs.
intermediate arr
:
array([[1, 0, 3, 2],
[3, 1, 0, 2],
[1, 0, 3, 2],
[0, 1, 3, 2],
[1, 0, 2, 3],
[0, 1, 3, 2],
[1, 0, 3, 2],
[0, 3, 1, 2],
[2, 1, 0, 3],
[0, 1, 2, 3]])
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.