Issue
So I am currently putting together a script which does the following steps:
- Simulate from a poisson distribution. Say this sample results in
5
from the poisson distribution - From the value of the poisson distribution, simulate
5
values from a lognormal distribution - Repeat the above process 100k+ times
- Apply some logic to these values to get some staistics out
The fastest way I think to do this is via a large matrix which has, down the rows each simulation and across the columns every value simulated from the lognormal distribution. With this matrix in place I can then apply the metrics I need.
Here's what I've done so far, which seems to work - however i'm a beginner at python and I think there should be a way to do this "vectorised" and without having to resort to a for-loop.
Any improvements in speed would be greatly appreciated as I ideally want to run this for 1m simulations:
import pandas as pd
import numpy as np
import os
import scipy.stats as sp
n_sims = 100000
MU = 3
poisson = sp.poisson(mu=MU)
#first want to sample 10000 numbers from a poisson ## this is step 1
no_clms_arr = poisson.rvs(size=n_sims)
#now for each element sampled sample a lognormal and store the results back to a matrix
result = np.zeros((n_sims,no_clms_arr.max()))
for idx,i in enumerate(no_clms_arr):
if i == 0:
continue # no need to do anything here
sev_sample = sp.lognorm(50,25).rvs(i) ## this is step 2
sev_sample = np.pad(sev_sample,(0,no_clms_arr.max()-i),'constant',constant_values=(0,0)) ## padding out so I can append back to the result matrix
result[idx,:] = sev_sample
#now calculate some metrics ## step 4, example metric is below
lim = 5e6
xs = 2e6
np.clip(a = result-xs,a_min=0,a_max=lim).sum(axis=1).mean()
Solution
You can do it completely vectorized as the result
array is unnecessary so you just need to generate your random numbers and then do the clip and sum:
vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
(vals - xs).clip(0, lim).sum() / n_sims
You can speed up your method with a for loop (assuming it is necessary to make the array result
) by generating all of your random numbers outside of the for loop in a single call, and then assign these values to the corresponding point in result
with a for loop, ie replace your for loop with:
vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
start = 0
for i, j in enumerate(no_clms_arr):
if j:
result[i, :j] = vals[start: start + j]
start += j
You can also replace the .sum(axis=1).mean()
call with .sum()/n_sims
for an additional speed up.
Timings:
def op(n_sims = 10000, mu = 3, lim=5e6, xs=2e6):
poisson = sp.poisson(mu=mu)
no_clms_arr = poisson.rvs(size=n_sims)
result = np.zeros((n_sims,no_clms_arr.max()))
for idx,i in enumerate(no_clms_arr):
if i == 0:
continue # no need to do anything here
sev_sample = sp.lognorm(50,25).rvs(i) ## this is step 2
sev_sample = np.pad(sev_sample,(0,no_clms_arr.max()-i),'constant',constant_values=(0,0)) ## padding out so I can append back to the result matrix
result[idx,:] = sev_sample
return np.clip(a = result-xs,a_min=0,a_max=lim).sum(axis=1).mean()
def nin17_vectorized(n_sims = 10000, mu=3, lim=5e6, xs=2e6):
poisson = sp.poisson(mu=mu)
no_clms_arr = poisson.rvs(size=n_sims)
vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
return (vals - xs).clip(0, lim).sum() / n_sims
def nin17_for(n_sims = 10000, mu=3, lim=5e6, xs=2e6):
poisson = sp.poisson(mu=mu)
no_clms_arr = poisson.rvs(size=n_sims)
result = np.zeros((n_sims,no_clms_arr.max()))
vals = sp.lognorm(50, 25).rvs(no_clms_arr.sum())
start = 0
for i, j in enumerate(no_clms_arr):
if j:
result[i, :j] = vals[start: start + j]
start += j
return np.clip(a = result-xs,a_min=0,a_max=lim).sum() / n_sims
state = np.random.get_state()
np.random.set_state(state)
print(op())
np.random.set_state(state)
print(nin17_for())
np.random.set_state(state)
print(nin17_vectorized())
%timeit op()
%timeit nin17_for()
%timeit nin17_vectorized()
Output:
5682372.071238058
5682372.071238058
5682372.071238059
2.54 s ± 8.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
5.62 ms ± 87.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.6 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Answered By - Nin17
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.