Issue
I would like to do something that is likely very simple, but is giving me difficulty. Trying to draw N
samples from a multivariate normal distribution and calculate the probability of each of those randomly drawn samples. Here I attempt to use scipy
, but am open to using np.random.multivariate_normal
as well. Whichever is easiest.
>>> import numpy as np
>>> from scipy.stats import multivariate_normal
>>> num_samples = 10
>>> num_features = 6
>>> std = np.random.rand(num_features)
# define distribution
>>> mvn = multivariate_normal(mean = np.zeros(num_features), cov = np.diag(std), allow_singular = False, seed = 42)
# draw samples
>>> sample = mvn.rvs(size = num_samples); sample
# determine probability of each drawn sample
>>> prob = mvn.pdf(x = sample)
# print samples
>>> print(sample)
[[ 0.04816243 -0.00740458 -0.00740406 0.04967142 -0.01382643 0.06476885]
...
[-0.00977815 0.01047547 0.03084945 0.10309995 0.09312801 -0.08392175]]
# print probability all samples
[26861.56848337 17002.29353025 2182.26793265 3749.65049331
42004.63147989 3700.70037411 5569.30332186 16103.44975393
14760.64667235 19148.40325233]
This is confusing for me for a number of reasons:
- For the
rvs
sampling function: I don't use the keyword argumentsmean
andcov
per the docs because it seems odd to define a distribution with amean
andcov
inmvn = multivariate_normal(mean = np.zeros(num_features), cov = np.diag(std), allow_singular = False, seed = 42)
and then repeat that definition in thervs
call. Am I missing something? - For the
mvn.pdf
call, the probability density is obviously >>>1 which isn't impossible for a continuous multivariate normal, but I would like to convert these numbers to approximate probabilities at that particular point. How can I do this?
Thanks!
Solution
I don't use the keyword arguments mean and cov per the docs... Am I missing something?
No, what you are doing is allowed. The design of the distributions allows both calling the methods with parameters (as you read in the docs) and "freezing" the distribution with parameters and calling the methods without parameters. These are equivalent:
mean = np.zeros(num_features)
cov = np.diag(std)
mvn = multivariate_normal(mean=mean, cov=cov, seed=42)
sample = mvn.rvs(size=num_samples)
pdf = mvn.pdf(sample)
sample2 = multivariate_normal.rvs(mean=mean, cov=cov, size=num_samples, random_state=42)
pdf2 = multivariate_normal.pdf(sample2, mean=mean, cov=cov)
np.testing.assert_equal(sample2, sample) # passes
np.testing.assert_equal(pdf2, pdf) # passes
I would like to convert these numbers to approximate probabilities at that particular point. How can I do this?... I would like the compute the probability within a specific epsilon of the sample value.
You can define a hypercube of side length eps
centered at each point and evaluate the cumulative density within that hypercube (with SciPy 1.10.0+).
eps = 0.01
mvn.cdf(sample - eps/2, lower_limit=sample + eps/2)
# array([2.87121214e-14, 1.81736055e-14, 2.33269634e-15, 4.00857084e-15,
# 4.48976867e-14, 3.95613589e-15, 5.95304832e-15, 1.72140983e-14,
# 1.57778144e-14, 2.04685939e-14])
You can get approximately the same result by multiplying the probability density by the volume of the hypercube:
vol = eps**num_features
pdf * vol
# array([2.87145307e-14, 1.81751442e-14, 2.33280494e-15, 4.00830854e-15,
# 4.49021911e-14, 3.95598175e-15, 5.95348449e-15, 1.72142965e-14,
# 1.57788643e-14, 2.04692967e-14])
If you prefer a hyperspherical region, you can multiply by the volume of a hypersphere instead of that of a hypercube. For a 6-dimensional space with eps
as the diameter of the hypersphere, vol = np.pi**3/6 * (eps/2)**6
.
Answered By - Matt Haberland
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.