Issue
I have a list of numbers (floats) and I would like to estimate the mean. I also need to estimate the variation of such mean. My goal is to resample the list 100 times, and my output would be an array with length 100, each element corresponding to the mean of a resampled list.
Here is a simple workable example for what I would like to achieve:
import numpy as np
data = np.linspace(0, 4, 5)
ndata, boot = len(data), 100
output = np.mean(np.array([data[k] for k in np.random.uniform(high=ndata, size=boot*ndata).astype(int)]).reshape((boot, ndata)), axis=1)
This is however quite slow when I have to repeat for many lists with large number of elements. The method also seems very clunky and un-Pythonic. What would be a better way to achieve my goal?
P.S. I am aware of scipy.stats.bootstrap
, but I have problem upgrading scipy
to 1.7.1
in anaconda
to import this.
Solution
Use np.random.choice
:
import numpy as np
data = np.linspace(0, 4, 5)
ndata, boot = len(data), 100
output = np.mean(
np.random.choice(data, size=(100, ndata)),
axis=1)
If I understood correctly, this expression (in your question's code):
np.array([data[k] for k in np.random.uniform(high=ndata, size=boot*ndata).astype(int)]).reshape((boot, ndata)
is doing a sampling with replacement and that is exactly what np.random.choice does.
Here are some timings for reference:
%timeit np.mean(np.array([data[k] for k in np.random.uniform(high=ndata, size=boot*ndata).astype(int)]).reshape((boot, ndata)), axis=1)
133 µs ± 3.96 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.mean(np.random.choice(data, size=(boot, ndata)),axis=1)
41.1 µs ± 538 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As it can be seen np.random.choice
yields 3x improvement.
Answered By - Dani Mesejo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.