Issue
Here is what I've tried. I've been playing with this for a very long time and cannot figure out what I'm doing wrong. Can anyone help identify what I'm not seeing?
I'm trying to create 1,000 samples, each containing two variables, where one variable is correlated to the other with r=0.85 (or whatever correlation I specify). I don't really understand the cholesky decomposition, so I'm assuming that the problem lies somewhere in that step.
# Create random normal bivariate data with r=0.85
rng = np.random.default_rng(0)
correlation = 0.85
corr_matrix = np.array([[1, correlation], [correlation, 1]])
L = np.linalg.cholesky(corr_matrix)
n = 1000
random_data = rng.normal(size=(n, 2))
synthetic_data = np.dot(random_data, L)
# Check the correlation
r = stats.pearsonr(synthetic_data.T[0], synthetic_data.T[1])[0]
# r computes to 0.646.
Solution
Your multiplication of L
and random_data
isn't quite right. Change
synthetic_data = np.dot(random_data, L)
to
synthetic_data = np.dot(random_data, L.T)
See Generate correlated data in Python (3.3) for an alternative that uses the multivariate_normal
method of the random generator. The link at the end of that answer goes to a SciPy cookbook page that is also worth checking out.
Answered By - Warren Weckesser
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.