Issue
I'm studying "Deep Reinforcement Learning" and build my own example after pytorch's REINFORCEMENT LEARNING (DQN) TUTORIAL.
I'm implement actor's strategy as follows: 1. model.eval() 2. get best action from a model 3. self.net.train()
The question is: Does going back and forth between eval() and train() modes cause any damage to optimization process?
The model includes only Linear and BatchNorm1d layers. As far as I know when using BatchNorm1d one must perform model.eval() to use a model, because there is different results in eval() and train() modes.
When training Classification Neural Network the model.eval() performed only after training is finished, but in case of "Deep Reinforcement Learning" it is usual to use strategy and then continue the optimization process.
I'm wondering if going back and forth between modes is "harmless" to optimization process?
def strategy(self, state):
# Explore or Exploit
if self.epsilon > random():
action = choice(self.actions)
else:
self.net.eval()
action = self.net(state.unsqueeze(0)).max(1)[1].detach()
self.net.train()
Solution
eval()
puts the model in the evaluation mode.
In the evaluation mode, the Dropout layer just acts as a "passthrough" layer.
During training, a
BatchNorm
layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1. During the evaluation, this running mean/variance is used for normalization.
So, going back and forth between eval()
and train()
modes do not cause any damage to the optimization process.
Answered By - Wasi Ahmad
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.