Issue
I'm working on an PPO agent that plays (well, should) Doom using TF-Agents. As input to the agent, I am trying to give it a stack of 4 images. My complete code is in the following link: https://colab.research.google.com/drive/1chrlrLVR_rwAeIZhL01LYkpXsusyFyq_?usp=sharing
Unhappily, my code does not compile. It returns a TypeError in the line shown below (it is being run in Google Colaboratory).
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-d1571cbbda6b> in <module>()
8 t_step = tf_env.reset()
9 while (episode_steps <= max_steps_per_episode or (not t_step.is_last())):
---> 10 policy_step = agent.policy.action(t_step)
11 t_step = tf_env.step(policy_step.action)
12 episode_steps += 1
5 frames
/usr/local/lib/python3.7/dist-packages/tf_agents/utils/nest_utils.py in assert_same_structure(nest1, nest2, check_types, expand_composites, message)
112 str2 = tf.nest.map_structure(
113 lambda _: _DOT, nest2, expand_composites=expand_composites)
--> 114 raise exception('{}:\n {}\nvs.\n {}'.format(message, str1, str2))
115
116
TypeError: policy_state and policy_state_spec structures do not match:
()
vs.
{'actor_network_state': ListWrapper([., .])}
The thing about this error is, for what I've read in the TF-Agents documentation, the user is not supposed to do anything regarding the policy_state since it is generated automatically based on the agent's networks.
This is a similar error I found, but didn't seem to solve my problem, though it hinted me in one of the tryed solutions: py_environment 'time_step' doesn't match 'time_step_spec'
After reading the question and the answer above, I realized I was promising an observation_spec like this:
self._observation_spec = array_spec.BoundedArraySpec(shape=(4, 160, 260, 3), dtype=np.float32, minimum=0, maximum=1, name='screen_observation')
But what I was passing was a list of 4 np.arrays with shape = (160, 260, 3):
self._stacked_frames = []
for _ in range(4):
new_frame = np.zeros((160, 260, 3), dtype=np.float32)
self._stacked_frames.append(new_frame)
I did this because I thought the "shape" of my data wouldn't change, since the list always has the same number of elements as the first dimension of the observation_spec. Lists were easier to delete past frames and add new ones, like this:
def stack_frames(self):
#This gets the current frame of the game
new_frame = self.preprocess_frame()
if self._game.is_new_episode():
for frame in range(4):
self._stacked_frames.append(new_frame)
#This pop was meant to clear an empty frames that was already in the list
self._stacked_frames.pop(0)
else:
self._stacked_frames.append(new_frame)
self._stacked_frames.pop(0)
return self._stacked_frames
I was trying with only np.arrays before, but was not able to delete past frames and add new ones. Probably I was not doing it right, but I felt like the self._stacked_frames was born with the same shape as the observation spec and could not simply delete or add new arrays.
self._stacked_frames = np.zeros((4, 160, 260, 3), dtype=np.float32)
def stack_frames(self):
new_frame = self.preprocess_frame()
if self._game.is_new_episode():
for frame in range(4):
#This delete was meant to clear an empty frames that was already in the list
self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
#I tried "np.concatenate((self._stacked_frames, new_frame))" as well
self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
else:
self._stacked_frames = np.delete(self._stacked_frames, 0, 0)
#I tried "np.concatenate((self._stacked_frames, new_frame))" as well
self._stacked_frames = np.vstack((self._stacked_frames, new_frame))
return self._stacked_frames
This approach up here did not work. Like I said, probably I was doing it wrong. I see three ways of solving this stalemate:
- I declare the observation_spec as a list of four frames, each declared as np.array(160, 260, 3);
- I declared the observation_spec like I did, but delete and add frames from the self._stacked_frames the right way (not sure it is possible, since self._stacked_frames will be declared as np.array(4, 160, 260, 3) and I'm not sure it can become np.array(3, 160, 260, 3) or np.array(5, 160, 260, 3), before going back to being np.array(4, 160, 260, 3);
- I still declare the observation_spec like I did, but I do not delete or add frames. I make a loop where I copy the second frame (that enters the stack_frames function in the second slot) into the first slot, the third frame into the second slot, the fourth frame into the third slot, and finally, the new frame into the fourth slot. An illustration follows:
self._stacked_frames Slot: 1 | 2 | 3 | 4
Game image inside self._stacked_frames: A | B | C | D
New game image: E
New game image's positions (step 1): B | B | C | D
New game image's positions (step 2): B | C | C | D
New game image's positions (step 3): B | C | D | D
New game image's positions (step 4): B | C | D | E
New self._stacked_frames: B | C | D | E
This last one seemed like the most certain way to work around my problem, considering I'm right about what it is. I tried it, but the TypeError persisted. I tried it like this:
self._stacked_frames = np.zeros((self._frame_stack_size, 160, 260, 3), dtype=np.float32)
and then:
def stack_frames(self):
new_frame = self.preprocess_frame()
if self._game.is_new_episode():
for frame in range(self._frame_stack_size):
self._stacked_frames[frame] = new_frame
else:
for frame in range((self._frame_stack_size) - 1):
self._stacked_frames[frame] = self._stacked_frames[frame + 1]
self._stacked_frames[self._frame_stack_size - 1] = new_frame
return self._stacked_frames
Two questions then:
- Considering I'm right about the TypeError presented, what of the three ways of fixing it is best? Is there anything wrong then with the way I tryed my solution for the 3rd possibility?
- Considering I might not be right about the TypeError, what is this error about then?
Solution
I had the same issue and it was when calling policy.action(time_step)
. Action takes an optional parameter policy_state, which is by default "()".
I fixed the issue by calling
policy.action(time_step, policy.get_initial_state(batch_size=BATCH_SIZE))
I'm just starting with TF-Agents, so, I hope this has not some undesired effects.
Answered By - user1369611
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.