Issue
I've trained a machine learning model using sklearn and want to simulate the result by sampling the predictions according to the predict_proba probabilities. So I want to do something like
samples = np.random.choice(a = possible_outcomes, size = (n_data, n_samples), p = probabilities)
Where probabilities would be is an (n_data, n_possible_outcomes) array
But np.random.choice only allows 1d arrays for the p argument. I've currently gotten around this using a for-loop like the following implementation
sample_outcomes = np.zeros((len(probs), n_samples))
for i in trange(len(probs)):
sample_outcomes[i, :] = np.random.choice(outcomes, s = n_samples, p=probs[i])
but that's relatively slow. Any suggestions to speed this up would be much appreciated!
Solution
If I understood correctly you want a vectorize way of applying choice several times and each time with a different probabilities vector. You could implement this by hand as follows:
import numpy as np
# for reproducibility
np.random.seed(42)
# number of samples
k = 5
# possible outcomes
outcomes = np.arange(10)
# generate a random probability matrix for 15 runs
probabilities = np.random.random((15, 10))
probs = probabilities / probabilities.sum(1)[:, None]
# generate the choices by picking those probabilities above a random generated number
# the higher the value in probs the higher the probability to pick it
choices = probs - np.random.random((15, 10))
# to pick the top k using argpartition need to multiply by -1
choices = -1 * choices
# pick the top k values
res = outcomes[np.argpartition(choices, k, axis=1)][:, :k]
# flatten to match the expected output
print(res.flatten())
Output
[1 8 2 5 3 6 4 8 7 0 1 5 9 3 7 1 4 9 0 8 5 0 4 3 6 8 5 1 2 6 5 3 2 0 6 5 4
2 3 7 7 9 4 6 1 3 6 4 2 1 4 9 3 0 1 6 9 2 3 8 5 4 7 6 1 5 3 8 2 1 1 0 9 7
4]
In the above example the code sample 5 (k
) elements from a population of 10 (outcomes
) 15 times each time with a different probability vector (probs
with a shape of 15 by 10).
Answered By - Dani Mesejo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.