Issue
The Goal
I have an M-length array of 3-dimensional coordinate points as floats. I would like to create a 3-dimensional numpy array of a predefined shape filled with ellipsoids of a given float radius centered on these points. Because this is intended for image manipulation, I call each value in the array a "pixel". If these ellipsoids overlap, I would like to assign the pixel to the closer center by euclidean distance. The final output will be a numpy array with a background of 0 and pixels within ellipsoids numbered 1, 2,... M, as corresponding to the initial list of coordinates, similar to the output of scipy's ndimage.label(...).
Below, I have a naive approach which considers every position in the output array and compares it to every defined center, creating a binary array with a value of 1 for any pixel inside any ellipsoid. It then uses scikit-image to watershed this binary array. While this code works, it is prohibitively slow for my use, both because it considers every pixel and center pair and because it performs watershedding seperately. How can I speed up this code?
Naive Example
def define_centromeres(template_image, centers_of_mass, xradius = 4.5, yradius = 4.5, zradius = 3.5):
""" Creates a binary N-dimensional numpy array of ellipsoids.
:param template_image: An N-dimensional numpy array of the same shape as the output array.
:param centers_of_mass: A list of lists of N floats defining the centers of spots.
:param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
:param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
:param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
:return: A binary N-dimensional numpy array.
"""
out = np.full_like(template_image, 0, dtype=int)
for idx, val in np.ndenumerate(template_image):
z, x, y = idx
for point in centers_of_mass:
pz, px, py = point[0], point[1], point[2]
if (((z - pz)/zradius)**2 + ((x - px)/xradius)**2 + ((y - py)/yradius)**2) <= 1:
out[z, x, y] = 1
break
return out
Scikit-image's watershed function; it is unlikely to find a speed-up by changing the code inside this method:
def watershed_image(binary_input_image):
bg_distance = ndi.distance_transform_edt(binary_input_image,
return_distances=True,
return_indices=False)
local_maxima = peak_local_max(bg_distance, min_distance=1, labels=binary_input_image)
bg_mask = np.zeros(bg_distance.shape, dtype=bool)
bg_mask[tuple(local_maxima.T)] = True
marks, _ = ndi.label(bg_mask)
output_watershed = watershed(-bg_distance, marks, mask=binary_input_image)
return output_watershed
Small-scale example data:
zdim, xdim, ydim = 15, 100, 100
example_shape = np.zeros((zdim,xdim,ydim))
example_points = np.random.random_sample(size=(10,3))*np.array([zdim,xdim,ydim])
center_spots_image = define_centromeres(example_shape, example_points)
watershed_spots = watershed_image(center_spots_image)
Output:
center_spots_image, max projected to 2d
watershed_spots, max projected to 2d
Note that these images are only 2d representations of the final 3d output array.
Additional Notes
The typical size of the output array will be 31x512x512, or a total of 8.1e6 values, and the typical size of the input coordinates will be 40 3-dimensional coordinate points. I would like to optimize the speed of this procedure for this scale.
I am using numpy, scipy, and scikit-image in this project, and I must stick to these and other well-maintained and -documented packages.
Apologies for mistakes in the accessibility of the above code or for a lack of clarity in my explanation. I am a research scientist with little formalized computer science training.
Solution
Both the algorithm and its implementation do indeed leave some room for improvement. @Eric Johnson has covered the latter, now allow me to demonstrate another 20-fold speedup on the example by using a better algorithm.
Improvements: 1.Before masking restrict to an easily computed bounding box. 2. For overlap resolution recycle the distance calculations already done for ellipsoid computation.
Code (assumes Eric's function is alreaady defined):
import numpy as np
def rasterise(template_image, centers_of_mass, xradius=4.5, yradius=4.5, zradius=3.5):
"""Creates a labeled N-dimensional numpy array of ellipsoids.
:param template_image: An N-dimensional numpy array of the same shape as the output array.
:param centers_of_mass: A list of lists of N floats defining the centers of spots.
:param zradius: A float defining the radius in pixels in the z direction of the ellipsoids.
:param xradius: A float defining the radius in pixels in the x direction of the ellipsoids.
:param yradius: A float defining the radius in pixels in the y direction of the ellipsoids.
:return: An N-dimensional numpy array, with label `n` for the ellipsoid at index `n-1`.
"""
sh = template_image.shape
out = np.zeros(sh,int)
aux = np.zeros(sh)
radii = np.array([zradius,xradius,yradius])
for j,com in enumerate(centers_of_mass,1):
bboxl = np.floor(com-radii).clip(0,None).astype(int)
bboxh = (np.ceil(com+radii)+1).clip(None,sh).astype(int)
roi = out[tuple(map(slice,bboxl,bboxh))]
roiaux = aux[tuple(map(slice,bboxl,bboxh))]
logrid = *map(np.square,np.ogrid[tuple(
map(slice,(bboxl-com)/radii,(bboxh-com-1)/radii,1j*(bboxh-bboxl)))]),
dst = (1-sum(logrid)).clip(0,None)
mask = dst>roiaux
roi[mask] = j
np.copyto(roiaux,dst,where=mask)
return out
zdim, xdim, ydim = 15, 100, 100
example_shape = np.zeros((zdim,xdim,ydim))
example_points = np.random.random_sample(size=(10,3))*np.array([zdim,xdim,ydim])
center_spots_image = define_centromeres_labeled(example_shape, example_points)
csi = rasterise(example_shape, example_points)
print("number of pixels dfferent",np.count_nonzero(csi != center_spots_image),"out of",csi.size)
from timeit import timeit
print("Eric",timeit(lambda:define_centromeres_labeled(example_shape, example_points),number=10))
print("loopy",timeit(lambda:rasterise(example_shape, example_points),number=10))
Sample run:
number of pixels dfferent 0 out of 150000
Eric 0.37984768400201574
loopy 0.019632569048553705
Caveat:
Overlap resolution is sightly different between Eric's and my code. For example:
The difference is (I think) that Eric (top panel) uses standard Euclidean metric whereas I (bottom panel) use the metric suggested by the ellipsoids mostly out of opportunism but also because it may even be the right thing to do. Switching this over would be possible but at a speed penalty.
Answered By - loopy walt
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.