Issue
In Pytorch's RMSProp implementation we are given the parameter alpha
which according to the documentation:
alpha (float, optional) – smoothing constant (default: 0.99)
On the other hand, TF's implementation has the parameter rho
(Formally named decay
):
rho Discounting factor for the history/coming gradient. Defaults to 0.9.
Are those parameters the same with different names or are they different? I couldn't find any information regarding the differences.
Solution
If you compare the source code of PyTorch (here) and that of Tensorflow (on a forked build), you will see that alpha
and rho
are indeed the same.
Although, as opposed to Tensorflow, PyTorch is clear about the underlying logic for its module:
Answered By - Ivan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.