Issue
Can this matrix math be done faster?
I'm using Python to render 3D points in perspective. Speed is important, because it will translate directly to frame rate at some point down the line.
I tried to use NumPy functions, but I couldn't clear out two pesky list comprehensions. 90% of my program's runtime is during the list comprehensions, which makes sense since they contain all of the math, so I want to find a faster method if possible.
- The first list comprehension happens when making
pos
- it does a sum and matrix multiplication for every individual row ofvert_array
- The second one,
persp
, multiplies the x and y values of each row based on that specific row's z value.
Can replace those list comprehensions with something from NumPy? I read about numpy.einsum and numpy.fromfunction, but I was struggling to understand if they're even relevant to my problem.
Here is the function that does the main rendering calculations:
I want to make pos
and persp
faster:
import time
from random import randint
import numpy as np
def render_all_verts(vert_array):
"""
:param vert_array: a 2-dimensional numpy array of float32 values and
size 3 x n, formatted as follows, where each row represents one
vertex's coordinates in world-space coordinates:
[[vert_x_1, vert_y_1, vert_z_1],
[vert_x_2, vert_y_2, vert_z_2],
...
[vert_x_n, vert_y, vert_z]]
:return: a 2-dimensional numpy array of the same data type, size
and format as vert_array, but in screen-space coordinates
"""
# Unit Vector is a 9 element, 2D array that represents the rotation matrix
# for the camera after some rotation (there's no rotation in this example)
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype='float32')
# Raw Shift is a 3 element, 1D array that represents the position
# vector (x, y, z) of the camera in world-space coordinates
shift = np.array([0, 0, 10], dtype='float32')
# PURPOSE: This converts vert_array, with its coordinates relative
# to the world-space axes and origin, into coordinates relative
# to camera-space axes and origin (at the camera).
# MATH DESCRIPTION: For every row, raw_shift is added, then matrix
# multiplication is performed with that sum (1x3) and unit_array (3x3).
pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32')
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# This is a constant used to solve for the perspective
focus = 5
# PURPOSE: This calculation does the math to change the vertex coordinates,
# which are relative to the camera, into a representation of how they'll
# appear on screen in perspective. The x and y values are scaled based on
# the z value (distance from the camera)
# MATH DESCRIPTION: Each row's first two columns are multiplied
# by a scalar, which is derived from that row's third column value.
persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos])
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return persp
I wrote this code to time render_all_verts
and generate an list of random vertex coordinates to run through it repeatedly.
# TESTING RENDERING SPEED
start_time = time.time()
# The next few lines make an array, similar to the 3D points I'd be rendering.
# It contains n vertices with random coordinate values from -m to m
n = 1000
m = 50
example_vertices = np.array([(randint(-m, m), randint(-m, m), randint(-m, m)) for i in range(n)])
# This empty array, which is the same shape as example_vertices. The results are saved here.
rendered_verts = np.empty(example_vertices.shape)
print('Example vertices:\n', example_vertices)
# This loop will render the example vertices many times
render_times = 2000
for i in range(render_times):
rendered_verts = render_all_verts(example_vertices)
print('\n\nLast calculated render of vertices:\n', rendered_verts)
print(f'\n\nThis program took an array of {n} vertices with randomized coordinate')
print(f'values between {-m} and {m} and rendered them {render_times} times.')
print(f'--- {time.time() - start_time} seconds ---')
Finally, here's one instance of the terminal output:
C:\...\simplified_demo.py
Example vertices:
[[-45 4 -43]
[ 42 27 28]
[-33 24 -18]
...
[ -5 48 5]
[-17 -17 29]
[ -5 -46 -24]]
C:\...\simplified_demo.py:45: RuntimeWarning: divide by zero encountered in divide
persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1]))
Last calculated render of vertices:
[[ -6.81818182 0.60606061 -33. ]
[ 5.52631579 3.55263158 38. ]
[-20.625 15. -8. ]
...
[ -1.66666667 16. 15. ]
[ -2.17948718 -2.17948718 39. ]
[ -1.78571429 -16.42857143 -14. ]]
This program took an array of 1000 vertices with randomized coordinate
values between -50 and 50 and rendered them 2000 times.
--- 15.910243272781372 seconds ---
Process finished with exit code 0
P.S. NumPy seems to handle division by zero and overflow values fine for now, so I'm not worried about the Runtimewarning. I replacced my file paths with ...
P.P.S. Yes, I know I could just use OpenGL or any other existing rendering engine that would already handle all this math, but I'm more interested in reinventing this wheel. It's mostly an experiment for me to study Python and NumPy.
Solution
An intial speedup can be made by using vectorization
def render_all_verts(vert_array):
"""
:param vert_array: a 2-dimensional numpy array of float32 values and
size 3 x n, formatted as follows, where each row represents one
vertex's coordinates in world-space coordinates:
[[vert_x_1, vert_y_1, vert_z_1],
[vert_x_2, vert_y_2, vert_z_2],
...
[vert_x_n, vert_y, vert_z]]
:return: a 2-dimensional numpy array of the same data type, size
and format as vert_array, but in screen-space coordinates
"""
# Unit Vector is a 9 element, 2D array that represents the rotation matrix
# for the camera after some rotation (there's no rotation in this example)
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype='float32')
# Raw Shift is a 3 element, 1D array that represents the position
# vector (x, y, z) of the camera in world-space coordinates
shift = np.array([0, 0, 10], dtype='float32')
# PURPOSE: This converts vert_array, with its coordinates relative
# to the world-space axes and origin, into coordinates relative
# to camera-space axes and origin (at the camera).
# MATH DESCRIPTION: For every row, raw_shift is added, then matrix
# multiplication is performed with that sum (1x3) and unit_array (3x3).
pos2 = np.matmul(unit_vec, (vert_array + shift).T).T
"""
pos = np.array([np.matmul(unit_vec, row + shift) for row in vert_array], dtype='float32')
print(vert_array.shape, unit_vec.shape)
assert pos2.shape == pos.shape, (pos2.shape, pos.shape)
assert np.all(pos2 == pos), np.sum(pos - pos2)
"""
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# This is a constant used to solve for the perspective
focus = 5
# PURPOSE: This calculation does the math to change the vertex coordinates,
# which are relative to the camera, into a representation of how they'll
# appear on screen in perspective. The x and y values are scaled based on
# the z value (distance from the camera)
# MATH DESCRIPTION: Each row's first two columns are multiplied
# by a scalar, which is derived from that row's third column value.
x = focus / np.abs(pos2[:,2])
persp2 = np.multiply(pos2, np.dstack([x, x, np.ones(x.shape)]))
"""
persp = np.array([np.multiply(row, np.array([focus / abs(row[2]), focus / abs(row[2]), 1])) for row in pos2])
assert persp.shape == persp2.shape, (persp.shape, persp2.shape)
assert np.all(persp == persp2), np.sum(persp - persp2)
"""
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
return persp2
this runs in 0.18s
then we can remove the un-needed transpose parts for 50% more perf
def render_all_verts_2(vert_array):
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype='float32')
shift = np.array([0, 0, 10], dtype='float32')
pos2 = np.matmul(unit_vec, (vert_array + shift).T)
focus = 5
x = focus / np.abs(pos2[2])
persp2 = np.multiply(pos2, np.vstack([x, x, np.ones(x.shape)]))
return persp2.T[np.newaxis,]
this version runs in 0.12 seconds on my system
finally we can use numba for a 4x speedup to 0.03s
from numba import njit
@njit
def render_all_verts_2(vert_array):
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype=np.float32)
shift = np.array([0, 0, 10], dtype=np.float32)
pos2 = np.dot(unit_vec, (vert_array + shift).T)
focus = 5
x = focus / np.abs(pos2[2])
persp2 = np.multiply(pos2, np.vstack((x, x, np.ones(x.shape, dtype=np.float32))))
return persp2.T.reshape((1, persp2.shape[1], persp2.shape[0]))
this is 800 times faster than the 26 seconds it was taking before on my machine.
Even more optimized as suggested by @Nin17 (but I couldn't see any impact) Also removed the 3d reshape. Somewhere while testing I asserted that the shape should match 3d, this is changed.
# Best njit version with transpose - 0.03
@njit
def render_all_verts(vert_array):
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype=np.float32)
shift = np.array([0, 0, 10], dtype=np.float32)
focus = 5
data = (vert_array + shift).T
data = np.dot(unit_vec, data)
data[:2] *= focus / np.abs(data[2:3])
return data.T
# Best normal version wiht transpose - 0.10
def render_all_verts(vert_array):
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype=np.float32)
shift = np.array([0, 0, 10], dtype=np.float32)
focus = 5
data = (vert_array + shift).T
data = np.dot(unit_vec, data)
data[:2] *= focus / np.abs(data[2:3])
return data.T
# Without transpose is slower, probably because of BLAS implementation / memory contiguity etc.
# Without transpose normal - 0.14s (slowest)
def render_all_verts(vert_array):
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype=np.float32)
shift = np.array([0, 0, 10], dtype=np.float32)
focus = 5
pos2 = (vert_array + shift)
pos2 = np.dot(pos2, unit_vec)
pos2[:,:2] *= focus/np.abs(pos2[:,2:3])
return pos2
# njit without transpose (second fastest) - 0.06s
@njit
def render_all_verts_2(vert_array):
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype=np.float32)
shift = np.array([0, 0, 10], dtype=np.float32)
focus = 5
pos2 = (vert_array + shift)
pos2 = np.dot(pos2, unit_vec)
pos2[:,:2] *= focus/np.abs(pos2[:,2:3])
return pos2
also fastest implementation in plain python that I could do (using numba non supported features) - it is basically on par with numba for larger vertices, and faster than numba for even larger vertices.
def render_all_verts(vert_array):
unit_vec = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]], dtype=np.float32)
shift = np.array([0, 0, 10], dtype=np.float32)
focus = 5
data = vert_array.T.astype(np.float32, order='C')
data += shift[:,np.newaxis]
np.dot(unit_vec, data, out=data)
data[:2] *= focus / np.abs(data[2:3])
return data.T
Answered By - arrmansa
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.