Issue
I am trying to evaluate this function:
import scipy as sci
import numpy as np
np.pi*sci.special.gamma(2*i-1)/((2**(2*(i-1)))*(sci.special.gamma(i)**2))
which is running into trouble with "divide by zero", so I split it up in parts using a log to see, where the divide by zero appears.
The approach looks then in pseudo code:
log(f) = log(pi)+log(gamma(...))-log(2**...)-log(gamma(...))
The problem appears in the third part.
I want to solve this term for a range of i, so interestingly, this appears, when trying a loop:
N = np.arange(2,21)
for i in N:
f = np.log(2**(2*(i-1)))
print(i, f)
Out:
2 1.3862943611198906
3 2.772588722239781
4 4.1588830833596715
5 5.545177444479562
6 6.931471805599453
7 8.317766166719343
8 9.704060527839234
9 11.090354888959125
10 12.476649250079015
11 13.862943611198906
12 15.249237972318797
13 16.635532333438686
14 18.021826694558577
15 19.408121055678468
16 20.79441541679836
17 -inf
18 -inf
19 -inf
20 -inf
__main__:2: RuntimeWarning: divide by zero encountered in log
So it seems, that the problem occurs, when the argument becomes too large (i=17).
On the other hand, when I just calculate the single value for i=17, it works (and also for even larger i):
np.log(2**(2*(17-1)))
Out: 22.18070977791825
Where is the division by zero, and how can I generally overcome this problem when reaching to large arguments?
Solution
The main problem comes from an integer overflow due to the implicit type of np.arange
. Indeed, np.arange(2,21)
appears to result in an array of int32
-typed values, and so i
is of type int32
too, and so 2**(2*(i-1))
is. 2**(2*(i-1))
with i>=17
is greater than the limit of 2**31 (of the int32
type). The overflow cause the value to be truncated to 0, hence the Numpy error when the log is computed (since the log function is not defined for the input 0).
One simple solution is to change the type to int64 values, but this is not great as the output exponent quickly grow. Using floats is a bit better. However, this is far from being a great idea since the maximum exponent will be quickly reached with i
values close to 500 or above. The root of the issue is the numerical stability of the computation.
A good solution is to fix the numerical stability issue by using mathematical transformations. Indeed, log2(2**n) = n
and log(n) = log2(n)*log(2)
. Thus, log(2**(2*(i-1))) = log2(2**(2*(i-1)))*log(2) = (2*(i-1))*log(2)
. As a result the following code should give equivalent results but without numerical issues:
N = np.arange(2,21)
log2 = np.log(2)
for i in N:
f = (2*(i-1))*log2
print(i, f)
Note that this code is actually faster to compute and simpler. Note also that there is no problem with huge i
values like the one bigger than 1,000,000 with this method! Here is the result:
2 1.3862943611198906
3 2.772588722239781
4 4.1588830833596715
5 5.545177444479562
6 6.931471805599453
7 8.317766166719343
8 9.704060527839234
9 11.090354888959125
10 12.476649250079015
11 13.862943611198906
12 15.249237972318797
13 16.635532333438686
14 18.021826694558577
15 19.408121055678468
16 20.79441541679836
17 22.18070977791825
18 23.56700413903814
19 24.95329850015803
20 26.33959286127792
Answered By - Jérôme Richard
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.