Issue
I am trying to implement a DDPG agent to control the Gym's Pendulum.
Since I am new to gym, I was wondering if the state data collected via env.step(action)
is already normalized or I should do that manually. Also, should action
be normalized or in the [-2, 2] range?
Thanks
Solution
env.step(action)
returns tuple (observation
, reward
, done
, info
). If you're referring to data in observation
, then answer is no, it's not normalized (all with accordance to observation space section: three coordinates with values in [-1; 1] for the first two and [-8; 8] for the last one). action
should be normalized to [-2; 2] range, though it'll be addinionally clipped to this range.
Answered By - draw
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.