Saturday, November 11, 2023

[FIXED] Drawing sample and calculating sample probability from multivariate normal distribution using scipy.stats.multivariate_normal

November 11, 2023 normal-distribution, numpy, python, scipy, scipy.stats No comments

Issue

I would like to do something that is likely very simple, but is giving me difficulty. Trying to draw N samples from a multivariate normal distribution and calculate the probability of each of those randomly drawn samples. Here I attempt to use scipy, but am open to using np.random.multivariate_normal as well. Whichever is easiest.

>>> import numpy as np
>>> from scipy.stats import multivariate_normal

>>> num_samples = 10
>>> num_features = 6
>>> std = np.random.rand(num_features)

# define distribution
>>> mvn = multivariate_normal(mean = np.zeros(num_features), cov = np.diag(std), allow_singular = False, seed = 42)

# draw samples
>>> sample = mvn.rvs(size = num_samples); sample

# determine probability of each drawn sample
>>> prob = mvn.pdf(x = sample)

# print samples
>>> print(sample)
[[ 0.04816243 -0.00740458 -0.00740406  0.04967142 -0.01382643  0.06476885]
...
 [-0.00977815  0.01047547  0.03084945  0.10309995  0.09312801 -0.08392175]]

# print probability all samples
[26861.56848337 17002.29353025  2182.26793265  3749.65049331
 42004.63147989  3700.70037411  5569.30332186 16103.44975393
 14760.64667235 19148.40325233]

This is confusing for me for a number of reasons:

For the rvs sampling function: I don't use the keyword arguments mean and cov per the docs because it seems odd to define a distribution with a mean and cov in mvn = multivariate_normal(mean = np.zeros(num_features), cov = np.diag(std), allow_singular = False, seed = 42) and then repeat that definition in the rvs call. Am I missing something?
For the mvn.pdf call, the probability density is obviously >>>1 which isn't impossible for a continuous multivariate normal, but I would like to convert these numbers to approximate probabilities at that particular point. How can I do this?

Thanks!

Solution

I don't use the keyword arguments mean and cov per the docs... Am I missing something?

No, what you are doing is allowed. The design of the distributions allows both calling the methods with parameters (as you read in the docs) and "freezing" the distribution with parameters and calling the methods without parameters. These are equivalent:

mean = np.zeros(num_features)
cov = np.diag(std)

mvn = multivariate_normal(mean=mean, cov=cov, seed=42)
sample = mvn.rvs(size=num_samples)
pdf = mvn.pdf(sample)

sample2 = multivariate_normal.rvs(mean=mean, cov=cov, size=num_samples, random_state=42)
pdf2 = multivariate_normal.pdf(sample2, mean=mean, cov=cov)

np.testing.assert_equal(sample2, sample)  # passes
np.testing.assert_equal(pdf2, pdf)  # passes

I would like to convert these numbers to approximate probabilities at that particular point. How can I do this?... I would like the compute the probability within a specific epsilon of the sample value.

You can define a hypercube of side length eps centered at each point and evaluate the cumulative density within that hypercube (with SciPy 1.10.0+).

eps = 0.01
mvn.cdf(sample - eps/2, lower_limit=sample + eps/2)
# array([2.87121214e-14, 1.81736055e-14, 2.33269634e-15, 4.00857084e-15,
#        4.48976867e-14, 3.95613589e-15, 5.95304832e-15, 1.72140983e-14,
#        1.57778144e-14, 2.04685939e-14])

You can get approximately the same result by multiplying the probability density by the volume of the hypercube:

vol = eps**num_features
pdf * vol
# array([2.87145307e-14, 1.81751442e-14, 2.33280494e-15, 4.00830854e-15,
#        4.49021911e-14, 3.95598175e-15, 5.95348449e-15, 1.72142965e-14,
#        1.57788643e-14, 2.04692967e-14])

If you prefer a hyperspherical region, you can multiply by the volume of a hypersphere instead of that of a hypercube. For a 6-dimensional space with eps as the diameter of the hypersphere, vol = np.pi**3/6 * (eps/2)**6.

Answered By - Matt Haberland

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 11, 2023

[FIXED] Drawing sample and calculating sample probability from multivariate normal distribution using scipy.stats.multivariate_normal

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels