Issue
So I'm programing a Basin Hoping from zero to find the potential minima of a system of interacting particles and I have obtained good results so far, but since since I increase the number of particles of the system, the code takes more time to run. I am using scipy conjugate gradient for finding local minima. I'm not getting any error messages and the program seems to work out fine, but I was wondering how I can optimize the code so the computing time is reduced.
I defined the Basin Hopping as a function, given:
def bhmet(n,itmax, pot,derpot, T):
i = 1
phi = np.random.rand(n)*2*np.pi
theta = np.random.rand(n)*np.pi
x = 3*np.sin(theta)*np.cos(phi)
y = 3*np.sin(theta)*np.sin(phi)
z = 3*np.cos(theta)
xyzat = np.hstack((x,y,z))
vmintot = 0
while i <= itmax:
print(i)
plmin = optimize.fmin_cg(pot,xyzat,derpot,gtol = 1e-5,) #posiciones para el mínimo local.4
if pot(plmin) < vmintot or np.exp((1/T)*(vmintot-pot(plmin))) > np.random.rand():
vmintot = pot(plmin)
xyzat = plmin + 2*0.3*(np.random.rand(len(plmin))-0.5)
i = i + 1
return plmin,vmintot
I tried defining the initial condition (the first 'xyzat') as a matrix, but the parameter of scipy.optimize.fmin_cg is requested as an array (and thats why in the functions I will be reshaping the array as a matrix).
The function I'm searching the global minima for is:
def ljpot(posiciones,):
r = np.array([])
matpos = np.zeros((int((1/3)*len(posiciones)),3))
matpos[:,0] = posiciones[0:int((1/3)*len(posiciones))]
matpos[:,1] = posiciones[int((1/3)*len(posiciones)):int((2/3)*len(posiciones))]
matpos[:,2] = posiciones[int((2/3)*len(posiciones)):]
for j in range(0,np.shape(matpos)[0]):
for k in range(j+1,np.shape(matpos)[0]):
ri = np.sqrt(sum((matpos[k,:]-matpos[j,:])**2))
r = np.append(r,ri)
V = 4*((1/r)**12-(1/r)**6)
vt = sum(V)
return vt
And its gradient would be:
def gradpot(posiciones,):
gradv = np.array([])
matposg = np.zeros((int((1/3)*len(posiciones)),3))
matposg[:,0] = posiciones[:int((1/3)*len(posiciones))]
matposg[:,1] = posiciones[int((1/3)*len(posiciones)):int((2/3)*len(posiciones))]
matposg[:,2] = posiciones[int((2/3)*len(posiciones)):]
for w in range(0,np.shape(matposg)[1]): #índice que cambia de columna.
for k in range(0,np.shape(matposg)[0]): #índice que cambia de fila.
rkj = np.array([])
xkj = np.array([])
for j in range(0,np.shape(matposg)[0]): #también este cambia de fila.
if j != k:
r = np.sqrt(sum((matposg[j,:]-matposg[k,:])**2))
rkj = np.append(rkj,r)
xkj = np.append(xkj,matposg[j,w])
dEdxj = sum(4*(6*(1/rkj)**8- 12*(1/rkj)**14)*(matposg[k,w]-xkj))
gradv = np.append(gradv,dEdxj)
return gradv
The reason I transform the array input in a matrix is that for each particle there are three coordinates x,y,z so the columns of the matrix are x's, y's, z's for every particle. I tried to do this using np.reshape(), but it seemed to give me wrong results for systems for which the program has already gotten the correct ones.
The code seems to work out fine, but as I increase the number of particles the running time increases exponentially. I know global optimization can take long time, but maybe I'm messing a little with the code. I don´t know if the way to reduce the time of running is pretty obvious,I'm kinda new to optimizing the code, so sorry if it's the case. Of course, any advices are welcome. Thank you all very much!
Solution
Two things I noticed after a quick look where you can definitely save some time. Both require some more thinking before, but afterwards you will be rewarded with an optimised and cleaner code.
1. Try to avoid using append
. append
is highly inefficient. You start with an empty array, and afterwards need to allocate more memory each time. This leads to inefficient memory handling, as you copy your array each time you append a number. The longer the array gets, the more inefficient append
becomes.
Alternative: Use np.zeros((m,n))
to initialise the array, with m
and n
being the size it will end up with. Then you need a counter that puts the new values in the corresponding places. If the size of the array is not defined before your calculation, you can initialise it as a big number, and afterwards cut it.
2. Try to avoid using for
loops. They are generally very slow, especially when iterating through large matrices, as you need to index each entry individually.
Alternative: Matrix operations are generally much faster. For example, instead of r = np.sqrt(sum((matposg[j,:]-matposg[k,:])**2))
inside two for
loops, you could first define two matrices A
and B
that correspond to matposg[j,:]
and matposg[k,:]
(should be possible without the use of loops!) and then simply use r = np.linalg.norm(A-B)
Answered By - Mafu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.