Issue
Today, when I was trying to implement an rl-agent under the environment openai-gym, I found a problem that it seemed that all agents are trained from the most initial state: env.reset()
, i.e.
import gym
env = gym.make("CartPole-v0")
initial_observation = env.reset() # <-- Note
done = False
while not done:
action = env.action_space.sample()
next_observation, reward, done, info = env.step(action)
env.close() # close the environment
So it is natural that the agent can behave down the route env.reset() -(action)-> next_state -(action)-> next_state -(action)-> ... -(action)-> done
, this is an episode. But how can an agent start from a sepecific state like a middle state, then take an action from that state? For example, I sample an experience from the replay buffer, i.e. (s, a, r, ns, done)
, what if I want train the agent start directly from the state ns
, and get an action with a Q-Network
, then for an n-step
steps forward. Something like that:
import gym
env = gym.make("CartPole-v0")
initial_observation = ns # not env.reset()
done = False
while not done:
action = DQN(ns)
next_observation, reward, done, info = env.step(action)
# n-step later or done is true, break
env.close() # close the environment
But even though I set a variable initial_observation
as ns
, I think the agent or the env
will not aware it at all. How can I tell the gym.env
that I want set the initial observation as ns
and let the agent know the specific start state, get continue train directly from that specific observation(get start with that specific environment)?
Solution
AFAIK, the current implementation of most OpenAI gym envs (including the CartPole-v0 you have used in your question) doesn't implement any mechanism to init the environment in a given state.
However, it shouldn't be too complex to modify the CartPoleEnv.reset()
method in order to accept an optional parameter that acts as initial state.
Answered By - Pablo EM
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.