Friday, August 19, 2022

[FIXED] How to set a openai-gym environment start with a specific state not the `env.reset()`?

August 19, 2022 openai-gym, python-3.x, reinforcement-learning No comments

Issue

Today, when I was trying to implement an rl-agent under the environment openai-gym, I found a problem that it seemed that all agents are trained from the most initial state: env.reset(), i.e.

import gym

env = gym.make("CartPole-v0")
initial_observation = env.reset()  # <-- Note
done = False

while not done:
    action = env.action_space.sample()  
    next_observation, reward, done, info = env.step(action)

env.close()  # close the environment

So it is natural that the agent can behave down the route env.reset() -(action)-> next_state -(action)-> next_state -(action)-> ... -(action)-> done, this is an episode. But how can an agent start from a sepecific state like a middle state, then take an action from that state? For example, I sample an experience from the replay buffer, i.e. (s, a, r, ns, done), what if I want train the agent start directly from the state ns, and get an action with a Q-Network, then for an n-step steps forward. Something like that:

import gym

env = gym.make("CartPole-v0")
initial_observation = ns  # not env.reset() 
done = False

while not done:
    action = DQN(ns) 
    next_observation, reward, done, info = env.step(action)
    # n-step later or done is true, break

env.close()  # close the environment

But even though I set a variable initial_observation as ns, I think the agent or the env will not aware it at all. How can I tell the gym.env that I want set the initial observation as ns and let the agent know the specific start state, get continue train directly from that specific observation(get start with that specific environment)?

Solution

AFAIK, the current implementation of most OpenAI gym envs (including the CartPole-v0 you have used in your question) doesn't implement any mechanism to init the environment in a given state.

However, it shouldn't be too complex to modify the CartPoleEnv.reset() method in order to accept an optional parameter that acts as initial state.

Answered By - Pablo EM

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, August 19, 2022

[FIXED] How to set a openai-gym environment start with a specific state not the `env.reset()`?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels