Issue
So I have an issue that I will explain: I have a set of images that come from an experiment, each pixel of these images has 3 values associated which depend on the position (in case you wonder this is a diffraction experiment). What I would need is the intensity of the pixel, but as a function of these three values. The way I'm currently implementing it in python is with a bunch of for loops that find the pixels that have values within a narrow range and summing them together and saving this as a point in my new "space". Here is a basic implementation with some duummy values:
import numpy as np
def MaskApply(HKLvals, h, k, ll, dh, dk, dl):
ans = (
np.where(HKLvals[2].flatten() < ll + dl, 1, 0)
* np.where(HKLvals[2].flatten() > ll, 1, 0)
* np.where(HKLvals[1].flatten() > k, 1, 0)
* np.where(HKLvals[1].flatten() < k + dk, 1, 0)
* np.where(HKLvals[0].flatten() < h + dh, 1, 0)
* np.where(HKLvals[0].flatten() > h, 1, 0)
)
ans = np.reshape(ans, HKLvals[2].shape, order="C")
return ans
data = np.random.rand(120, 1000, 500) # this is a collection of 120 images each 1000x500
# This is read by a file in my actual code
HKLvals = np.ones((120, 3, 1000, 500)) # These are some values connected to the images each of the 1000x500 pixels has three values. for the 120 images
H, dh = np.linspace(0, 10, 1000, retstep=True)
K, dk = np.linspace(0, 10, 1000, retstep=True)
L, dl = np.linspace(0, 5, 100, retstep=True)
outData = np.zeros((len(H), len(K)))
for i in range(len(L)):
l = L[i]
for j in range(len(H)):
h = H[j]
for t in range(len(K)):
k = K[t]
maskArr = np.zeros(np.shape(data))
pixel_number = 0
for m in range(np.shape(data)[0]):
mask = MaskApply(HKLvals[m], h, k, l, dh, dk, dl)
pixel_number += np.count_nonzero(mask)
maskArr[m] = mask
maskArr = maskArr.astype(bool)
outData[j, t] = np.sum(data[maskArr]) / pixel_number
print(outData) # of course in reality this is saved in an .h5
This is the gist of my code ATM. I'd love if anybody had any input on this and could point out some efficiency mistakes, or perhaps a different algorithm to do this that maybe I'm not thinking about.
I suppose the easiest way to improve the efficiency would be to stop using Python and convert to something more performing like C++ or Julia. I'm also looking into that, but I'm very not well versed in those so in the meantime it would be great if anybody had any input.
Thanks to all who will read.
EDIT: fixed the code which is now running properly
Solution
Given our conversation in the question comments, I propose the following solution, that addresses a few points mentioned by other people, namely:
- Create the masks several times (here you create the masks just once for each image)
- flatten and reshape several time (no need to do that, numpy can handle comparisons in multi-dimensional case)
- mask combination using appropriate tools
- result aggregation (you can just compute the average, no need to keep track of number of pixels selectede)
here is the code:
import numpy as np
from itertools import product
def process_image(image, HKL_values, H, K, L, dh, dk, dl):
Nh, Nk, Nl = map(len, (H, K, L))
h_mask = np.array(
[np.logical_and(
HKL_values[0] > h,
HKL_values[0] < h + dh
) for h in H]
)
k_mask = np.array(
[np.logical_and(
HKL_values[1] > k,
HKL_values[1] < k + dk
) for k in K]
)
l_mask = np.array(
[np.logical_and(
HKL_values[2] > ll,
HKL_values[2] < ll + dl
) for ll in L]
)
intensities = np.zeros((Nh, Nk, Nl))
for h, k, l in product(range(Nh), range(Nk), range(Nl)):
mask = h_mask[h] & k_mask[k] & l_mask[l]
intensities[h, k, l] = np.average(image[mask])
return intensities
# Load data
data = ... # load images
HKL_vals = ... # load HKL values for those images
# define HKL steps
H, dh = np.linspace(0, 10, 1000, retstep=True)
K, dk = np.linspace(0, 10, 1000, retstep=True)
L, dl = np.linspace(0, 5, 100, retstep=True)
for i in range(len(data)):
image = data[i]
vals = HKL_vals[i]
result = process_image(image, vals, H, K, L, dh, dk, dl)
# save result for that image
It uses the process image function to create the masks once, and then uses the itertools function product
to create all possible combinations of H, K, and L masks (for each of them value steps), selecting those corresponding pixels from the image and averaging their intensity. This produces a 3D-array with indices along H, K, and L steps, each with the average intensity of pixels that correspond to that particular HKL configuration.
Note also that x_mask will be a collection of all masks for the dimension x, and x_mask[i] is the mask corresponding to the i-th step of dimension x
Important note before running
As numpy stores booleans in a byte (8bits), the memory consumption of pre-calculated masks is 8x heavier than what this answer considers. As pointed out by @SamMason, h_mask
will occupy ~500MB! If you have RAM limitations (in the end the masks alone will demand ~1.05GB, and you still need essentially 4D images@ 1000x500), you will inevitably have to break down the solution, pre-computing the smaller masks, and then computing the other masks as needed (just the body of adapted process_image
:
l_mask = np.array(
[np.logical_and(
HKL_values[2] > L[l],
HKL_values[2] < L[l] + dl
) for ll in L]
)
intensities = np.zeros((Nh, Nk, Nl))
for h in range(Nh):
h_mask = np.logical_and(
HKL_values[0] > H[h],
HKL_values[0] < H[h] + dl
)
for k in range(Nk):
k_mask = np.logical_and(
HKL_values[1] > K[k],
HKL_values[1] < K[k] + dl
)
for l in range(Nl):
mask = h_mask & k_mask & l_mask[l]
intensities[h, k, l] = np.average(image[mask])
return intensities
Note that the precomputed masks are used in the inner-most for-loop, which is the one that runs the most times.
Answered By - Lourenço Monteiro Rodrigues
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.