Issue
Is there a more time efficient manner to implement the following toy code, i.e., is their a numpy function (or whatever) I can use instead of the nested for loops?
import numpy as np
import time
x =np.asarray([[0.13528165, 0.75680003, 0.34140618, 0.27220936],
[0.05577458, 0.10935562, 0.10391207, 0.27655284],
[0.69261246, 0.473227 , 0.74132719, 0.49142857],
[0.47410374, 0.16312079, 0.32911195, 0.9621932 ],
[0.68697019, 0.29684091, 0.90821942, 0.17157798],
[0.62682866, 0.50055864, 0.86398873, 0.70907045],
[0.73800433, 0.92377443, 0.98588321, 0.84503027],
[0.38787016, 0.13099305, 0.47687691, 0.54611905],
[0.40795951, 0.43677015, 0.49634543, 0.1169693 ],
[0.96947452, 0.64037515, 0.81471111, 0.85956936]])
clusters =[0, 0, 1, 0, 1, 1, 1, 0, 0, 1]
centroids =[[0.29219793, 0.31940793, 0.34953051, 0.43480875],
[0.74277803, 0.56695522, 0.86282593, 0.61533533]]
tic=time.perf_counter()
d = np.zeros(x.shape[1])
for k in range(d.size):
d[k] = 0
for j in range(x.shape[0]):
d[k] += abs(x[j][k] - centroids[clusters[j]][k])
print(d)
print(time.perf_counter()-tic)
#d = [1.24007222 1.96998689 0.88754977 2.41271772] #Output
How do I replace these nested for loops if their are if statements in between?
import numpy as np
d = np.array( [1.24007222, 1.96998689, 0.88754977, 2.41271772])
weights=np.array([0.25,0.25,0.25,0.25])
new_weights = np.zeros(weights.size)
eps=1e-3
beta=1.2
for k in range(new_weights.size):
if abs(d[k]) < eps:
continue
for current_d in d:
if abs(current_d) < eps:
continue
new_weights[k] += (d[k] / current_d) ** (1 / (beta - 1))
new_weights[k] = 1 / new_weights[k]
weights = new_weights
print(weights)
#weights=[0.15482004 0.0153019 0.82432504 0.00555302] #output
Solution
Working out this from reading your code:
First, cast your cluster labels and your centroids as arrays
clusters = np.array([0, 0, 1, 0, 1, 1, 1, 0, 0, 1])
centroids = np.array([[0.29219793, 0.31940793, 0.34953051, 0.43480875],
[0.74277803, 0.56695522, 0.86282593, 0.61533533]])
then:
d = np.sum( np.abs(x - centroids[clusters]), axis=0)
# d = array([1.24007222, 1.9699869 , 0.88754978, 2.4127177 ])
I see a speedup from 47 microseconds to 7 microseconds.
Edit
Answering your further question, a vectorized way of doing your loop:
new_weights = np.where( d < eps, 0, 1./np.sum( (d[None,:]/np.where(d<eps,np.inf, d)[:,None])** (1 / (beta - 1)), axis=0))
# = array([0.15482004, 0.0153019 , 0.82432504, 0.00555302])
Answered By - Learning is a mess
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.