Issue
I am trying to build an adjacency matrix from pixels of an elevation raster. The raster is a GeoTIFF image with specified nodata values outside a watershed.
The solution I have now is working but is far from optimal.
What I tried:
- Scikit-learn:
sklearn.feature_extraction.image.img_to_graph
works ok, but it only gets 4 way neighborhood (rook neighborhood) and I need it 8 way (queen neighborhood). - Scipy: the solution I have now. To iterate over a numpy 2d array and iterate over a 3x3 window, comparing the center with its neighbors, searching for non nodata pixels and feeding a scipy sparse matrix.
The problem is, I can run this for 2000x2000 images but I need to test it with a much larger image (more than 25000 pixels on each side).
There is any way to optimize this algorithm before I can try it in a more powerful computer?
Thanks in advance.
This is the code I have now:
def image_to_adjacency_matrix(image_array, nodata = None):
image_flat = image_array.flatten()
height, width = image_array.shape
adjacency_matrix = scipy.sparse.lil_matrix((height * width, height * width), dtype=int)
for i in range(height):
for j in range(width):
current_pixel = i * width + j
if image_flat[current_pixel] == nodata:
continue
for ni in range(i - 1, i + 2):
for nj in range(j - 1, j + 2):
if 0 <= ni < height and 0 <= nj < width:
neighbor_pixel = ni * width + nj
if image_flat[neighbor_pixel] == nodata:
continue
if current_pixel != neighbor_pixel:
adjacency_matrix[current_pixel, neighbor_pixel] = 1
return adjacency_matrix.tocsr()
Solution
Made an attempt at vectorizing this using NumPy.
Note: I implemented a simplified version of your problem, which avoids wrapping around the edge of the raster by only creating edges for 1 to width - 1, rather than from 0 to width. This simplified the logic enough that I could solve the problem with NumPy indexing.
def image_to_adjacency_matrix_opt(image_array, nodata = None):
image_flat = image_array.flatten()
height, width = image_array.shape
N = height * width
image_has_data = image_flat != nodata
index_dtype = np.int32 if N < 2 ** 31 else np.int64
adjacents = np.array([
-width - 1, -width, -width + 1,
-1, 1,
width - 1, width, width + 1
], dtype=index_dtype)
row_idx, col_idx = np.meshgrid(
np.arange(1, height - 1, dtype=index_dtype),
np.arange(1, width - 1, dtype=index_dtype),
indexing='ij'
)
row_idx = row_idx.reshape(-1)
col_idx = col_idx.reshape(-1)
pixel_idx = row_idx * width + col_idx
pixels_with_data = image_has_data[pixel_idx]
pixel_idx = pixel_idx[pixels_with_data]
neighbors = pixel_idx.reshape(-1, 1) + adjacents.reshape(1, -1)
neighbors_with_data = image_has_data[neighbors]
row = np.repeat(pixel_idx, repeats=neighbors_with_data.sum(axis=1))
col = neighbors[neighbors_with_data]
data = np.ones(len(row), dtype='uint8')
adjacency_matrix = scipy.sparse.coo_matrix((data, (row, col)), dtype=int, shape=(N, N))
return adjacency_matrix.tocsr()
I found this was roughly 100x faster for a 500x500 matrix with 50% of its entries randomly set to nodata.
Example test dataset:
N = 500
image = np.random.randint(0, 100, size=(N, N))
image[np.random.rand(N, N) < 0.5] = -1
image = image.astype('int8')
Answered By - Nick ODell
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.