Issue
The docs say:
Compatibility Guarantee A fixed seed and a fixed series of calls to ‘RandomState’ methods using the same parameters will always produce the same results up to roundoff error except when the values were incorrect. Incorrect values will be fixed and the NumPy version in which the fix was made will be noted in the relevant docstring. Extension of existing parameter ranges and the addition of new parameters is allowed as long the previous behavior remains unchanged.
There is no mention of operating systems.
If I call np.random.seed(42)
on windows and linux, will the random numbers generated afterwards be the same ?
Will it be the same across different versions of 64bit Ubuntu ?
I assume that the rng uses system libraries, so the code is probably not portable. If that's true, is there a fix ? I know that this would probably be ugly, like changing the linux rng system for something that emulates windows. But I'm ready for creative solutions.
Solution
Update as of numpy v1.17 (mid-2019):
The results should be the same across platforms, but not across numpy version.
np.random.seed
is described as a "convenience, legacy function"; it and the more recent/recommended alternative np.random.default_rng
can no longer be relied on to produce the same result across numpy versions, unless specifically using the legacy/compatibility API provided by np.random.RandomState
. While the RandomState module is guaranteed to provide consistent results, it is not updated with algorithmic (or correctness) improvements and is discouraged for use outside of unit testing and backwards compatibility.
See NEP 0019: Random number generator policy. It's actually a decent read :) The abstract reads:
For the past decade, NumPy has had a strict backwards compatibility policy for the number stream of all of its random number distributions. Unlike other numerical components in numpy, which are usually allowed to return different when results when they are modified if they remain correct, we have obligated the random number distributions to always produce the exact same numbers in every version. The objective of our stream-compatibility guarantee was to provide exact reproducibility for simulations across numpy versions in order to promote reproducible research. However, this policy has made it very difficult to enhance any of the distributions with faster or more accurate algorithms. After a decade of experience and improvements in the surrounding ecosystem of scientific software, we believe that there are now better ways to achieve these objectives. We propose relaxing our strict stream-compatibility policy to remove the obstacles that are in the way of accepting contributions to our random number generation capabilities.
This has been implemented in numpy. As of current writing (numpy version 1.22), numpy.random.default_rng()
constructs a new Generator
with the default BitGenerator
. But in the description of np.random.Generator
, the following guidance is attached:
No Compatibility Guarantee
Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.
Therefore, using np.random.default_rng()
will preserve random numbers for the same versions of numpy across platforms, but not across versions. The best practices for ensuring reproducibility are to preserve your exact environment, e.g. in a docker container. Short of this, storing the results of randomly generated data and using the saved results in downstream workflows can help with reproducibility, though of course this does not save you from API changes later in your workflow the way a docker container would.
Answered By - Michael Delgado
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.