Issue
I am trying to use the finite difference method with NumPy arrays, but the slicing is incredibly slow. It's not viable to use lists as I am applying math operations. Using matlab, equivalent code is executed in 2 seconds whilst the python code finishes executing for over 1 minute. My python code is:
import numpy as np
from scipy import interpolate
def heston_explicit_nonuniform(S,V,K,T,Ns,Nv,kappa,theta,rho,sigma,r,q,dt,dv,ds):
U = np.empty((Ns,Nv))
for s in range(Ns):
for v in range(Nv):
U[s,v] = np.maximum( S[s] - K , 0)
for t in range(Nt - 1):
for v in range(Nv-1):
U[0,v] = 0
U[Ns-1,v] = np.maximum( Smax - K ,0)
for s in range(Ns):
U[s,Nv-1] = np.maximum(S[s] - K, 0)
u = np.copy(U)
for s in range(1,Ns-1):
derV = (u[s,1] - u[s,0]) / (V[1] - V[0])
derS = (u[s+1,0] - u[s-1,0]) / (S[s+1]-S[s-1])
U[s,0] = u[s,0] + dt * ( -r*u[s,0] + (r-q) * S[s] * derS + kappa*theta*derV)
u = np.copy(U)
for s in range(1,Ns-1):
for v in range(1,Nv-1):
derS = 0.5*(u[s+1,v] - u[s-1,v]) / (S[s+1] - S[s-1])
derV = 0.5*(u[s,v+1] - u[s,v-1]) / (V[v+1] - V[v-1])
derSS = (u[s+1,v] - 2*u[s,v] + u[s-1,v]) / ( (S[s+1] - S[s])*(S[s]-S[s-1]))
derVV = (u[s,v+1] - 2*u[s,v] + u[s,v-1]) / ( (V[v+1] - V[v])*(V[v]-V[v-1]))
derSV = (u[s+1,v+1] + u[s-1,v-1] - u[s-1,v+1] - u[s+1,v-1]) \
/ (4 * (S[s+1] - S[s-1]) * (V[v+1] - V[v-1]))
A = 0.5 * V[v] * (S[s]**2) * derSS
B = rho*sigma*V[v]*S[s] *derSV
C = 0.5*(sigma**2)*V[v]*derVV
D = 0.5*(r-q)*S[s]*derS
E = kappa*(theta-V[v])*derV
L = dt * (A + B + C + D + E - r*u[s,v])
U[s,v] = L + u[s,v]
if t %100==0:
print(t)
return U
Vmax=0.5
Vmin=0
Smin=0
T=0.15
K=100.
Smax = 2*K
Vmax=0.5
r= 0.02
q=0.05
rho=-0.9
v0=0.05
sigma=0.3
kappa=1.5
theta=0.04
L=17
Nv=39
Ns=79
Nt = 3000
dt = T/Nt
Vi = np.linspace(0,Vmax,Nv)
Si = np.linspace(0,Smax,Ns)
dv = (Vmax-Vmin)/Nv
ds = (Smax-Smin)/Ns
call = heston_explicit_nonuniform(Si,Vi,K,T,Ns,Nv,kappa,theta,rho,sigma,r,q,dt,dv,ds)
data = interpolate.RectBivariateSpline(Si,Vi,call)
z_new = data(101.52,0.05412)
print('FDM: ', z_new[0,0])
Solution
The thing is, there is no slicing in your code. Which is what is missing.
In numpy, you compute things on whole arrays, not elements by elements.
In other words, you need to avoid for
loops, that are pure python (and therefore very slow: the only reason python is quite fast when using numpy, is because most of the CPU time is spend in C code, not in python code).
So for example, to focus on a specific and quite easy part of your code, this loop
for s in range(1,Ns-1):
derV = (u[s,1] - u[s,0]) / (V[1] - V[0])
derS = (u[s+1,0] - u[s-1,0]) / (S[s+1]-S[s-1])
U[s,0] = u[s,0] + dt * ( -r*u[s,0] + (r-q) * S[s] * derS + kappa*theta*derV)
should not exist (or, to be more accurate, it should exist only in numpy's internal code, in C)
what you do here, is, for each s between 1 and Ns-1, you compute derV
, that is a combination of sth element of 2 vectors (u[:,1]
and u[:,0]
, so 2 columns of u
) and a scalar (V[1]-V[0]
).
So, all together that is Ns-2
computation for 2 Ns-2
vectors. You can let numpy do that
derV = (u[1:-1,1] - u[1:-1,0) / (V[1]-V[0])
And now, derV
is an array of the Ns-2
value you compute
So this part of the code can be turned into
derV = (u[1:-1,1] - u[1:-1,0) / (V[1]-V[0])
for s in range(1,Ns-1):
derS = (u[s+1,0] - u[s-1,0]) / (S[s+1]-S[s-1])
U[s,0] = u[s,0] + dt * ( -r*u[s,0] + (r-q) * S[s] * derS + kappa*theta*derV[s-1])
So we have "vectorized" the computation of the Ns-2
derV
value
Likewise
derS = (u[2:,0] - u[:-2,0]) / (S[2:]-S[:-2])
So
derV = (u[1:-1,1] - u[1:-1,0) / (V[1]-V[0])
derS = (u[2:,0] - u[:-2,0]) / (S[2:]-S[:-2])
for s in range(1,Ns-1):
U[s,0] = u[s,0] + dt * ( -r*u[s,0] + (r-q) * S[s] * derS[s-1] + kappa*theta*derV[s-1])
And that last iterated computation, U[s,0]
is also just Ns-2
independant computations, from the Ns-2
u[s,0]
values, the Ns-2
derS
values, the Ns-2
derV
values, the Ns-2
S
values, and some scalars.
So it can also be turned into
derV = (u[1:-1,1] - u[1:-1,0) / (V[1]-V[0])
derS = (u[2:,0] - u[:-2,0]) / (S[2:]-S[:-2])
U[1:-1,0] += dt * ( -r*u[1:-1,0] + (r-q) * S[1:-1] * derS + kappa*theta*derV)
If I continue this for your whole code
import numpy as np
from scipy import interpolate
def heston_explicit_nonuniform(S,V,K,T,Ns,Nv,kappa,theta,rho,sigma,r,q,dt,dv,ds):
U = np.empty((Ns,Nv))
U[:,:] = np.maximum(S-K, 0)[:,None]
for t in range(Nt - 1):
U[0,:]=0
U[Ns-1,:] = np.maximum(Smax-K,0)
U[:,Nv-1] = np.maximum(S-K,0)
u=U.copy()
derV=(u[1:-1,1]-u[1:-1,0])/(V[1]-V[0])
derS=(u[2:,0]-u[:-2,0])/(S[2:]-S[:-2])
U[1:-1,0] = u[1:-1,0] + dt*(-r*u[1:-1,0] + (r-q)*S[1:-1] * derS + kappa*theta*derV)
u = U.copy()
derS = 0.5*(u[2:,1:-1] - u[:-2,1:-1]) / (S[2:]-S[:-2])[:,None]
derV = 0.5*(u[1:-1,2:]-u[1:-1,:-2]) / (V[2:]-V[:-2])[None,:]
derSS = (u[2:,1:-1]-2*u[1:-1,1:-1]+u[:-2,1:-1]) / ((S[2:]-S[1:-1]) * (S[1:-1]-S[:-2]))[:,None]
derVV = (u[1:-1,2:]-2*u[1:-1,1:-1]+u[1:-1,:-2]) / ((V[2:]-V[1:-1]) * (V[1:-1]-V[:-2]))[None,:]
derSV = (u[2:,2:] + u[:-2,:-2] - u[:-2, 2:] - u[2:,:-2]) \
/ 4 / (S[2:]-S[:-2])[:,None] / (V[2:]-V[:-2])[None,:]
A = 0.5*V[None,1:-1] * (S[1:-1,None]**2) * derSS
B = rho*sigma*V[None,1:-1]*S[1:-1,None] * derSV
C = 0.5*(sigma**2)*V[None,1:-1]*derVV
D = 0.5*(r-q)*S[1:-1, None]*derS
E = kappa*(theta-V[None,1:-1])*derV
L = dt * (A+B+C+D+E- r*u[1:-1,1:-1])
U[1:-1,1:-1] += L
return U
Vmax=0.5
Vmin=0
Smin=0
T=0.15
K=100.
Smax = 2*K
Vmax=0.5
r= 0.02
q=0.05
rho=-0.9
v0=0.05
sigma=0.3
kappa=1.5
theta=0.04
L=17
Nv=39
Ns=79
Nt = 3000
dt = T/Nt
Vi = np.linspace(0,Vmax,Nv)
Si = np.linspace(0,Smax,Ns)
dv = (Vmax-Vmin)/Nv
ds = (Smax-Smin)/Ns
call = heston_explicit_nonuniform(Si,Vi,K,T,Ns,Nv,kappa,theta,rho,sigma,r,q,dt,dv,ds)
data = interpolate.RectBivariateSpline(Si,Vi,call)
z_new = data(101.52,0.05412)
print('FDM: ', z_new[0,0])
It returns the exact same result (including the very last decimal place), because it is the exact same computation. Just, iterations are done inside numpy's internal code, not in python
Note that I haven't even tried to understand the algorithm. I just translated as a I would any arrays handling. Because, all the computation I've rewritten were batches of Ns-2 or Nv-2 or (Ns-2)(Nv-2) independent computation (the order wasn't important)
It would probably be interesting to go deeper in the algorithm itself, not just its coding. For example, I am pretty sure, one at least of the two copy()
could be avoided.
It would be harder (but probably not impossible) to also vectorize the for t
loop. Because this loop is different: computation done at the tth iteration is dependent on the result of the **(t-1)**th iteration's result. Which was not the case for the for s
and for v
loops
That may be possible using cumulative operator (that allows to use, in some specific cases, numpy "vectorization", that is hijacking internal numpy's for loop to replace yours, even when each iteration depends on the previous one. For example, using cumsum
, cumprod
.
(Memory wise, that would imply to keep a big array of all values of all (t,s,v) combinations. But from your example, that would be hardly 10 millions values, which is really not a problem nowadays)
But still, without even making any effort to adapt to the problem, by simple translation of for loops into numpy batches computation, we already gained a ×100 factor (on my slow machine, it went from precisely 100 seconds to 1 second. It is not every day that I see cases where timing ratio computation are that obvious :-) )
Answered By - chrslg
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.