Issue
I have a big matrix with values that vary greatly in orders of magnitude. To calculate the sum as accurate as possible, my approach would be to reshape the ndarray into a 1-dimensional array, sort it and then add it up, starting with the smallest entries. Is there a better / more efficient way to do this?
Solution
I think that, given floating point precision problems, the best known algorithm for your task is Kahan summation. For practical purposes, Kahan summation has an error bound that is independent of the number of summands, while naive summation has an error bound that grows linearly with the number of summands.
NumPy does not use Kahan summation, and there is no easy way of implementing it without a big performance tradeoff. But it uses the next best thing, pairwise summation, where error grows, under some reasonable assumptions, like the square root of the logarithm of the number of summands.
So it is very likely that Numpy is on its own already able to provide sufficiently good precision for your problem. To validate this, I would actually run a few sample cases through Kahan summation (the pseudocode in the Wikipedia link above can be trivially converted to Python), and take this as the golden, best possible result, and compare it against:
- Calling
np.sum
on your matrix as is. - Calling
np.sum
on your matrix after reshaping to 1D, which may give better results if your matrix is not contiguous in memory. - Calling
np.sum
on a sorted version of the 1D array.
For most cases these last three options should behave similarly, but the only way to know is to actually test it.
Answered By - Jaime
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.