Saturday, June 4, 2022

[FIXED] Keras Agent Training Takes Too Much Time

June 04, 2022 deep-learning, keras, machine-learning, reinforcement-learning, tensorflow No comments

Issue

I'm fairly new to reinforcement learning and I've built an agent that feeds two inputs to its neural network (first input is a tuple with two numbers representing the agents current position | second input is an array of numbers ranging from 0 to 3 representing what type of requests the agent receives from the environment) and outputs which movement is the best (move forwards, backwards, sideways etc...)

Each episode has 300 steps, the for loop inside the train_pos_nn() takes +5s (each call to predict() takes about 20ms and each call to fit() takes about 7ms), which amounts to +25 minutes per episode, which is too much time. (about 17 days to finish 1000 episodes which is the required number of episodes to converge / it takes the same amount of time on Google Colab ((Edit: even when using the GPU option and gpu cannot be setup to be used on my local machine))

Is there any way I can reduce the amount of time it takes the agent to train ?

n_possible_movements = 9
MINIBATCH_SIZE = 32

class DQNAgent(object):
    def __init__(self):
        #self.gamma = 0.95 
        self.epsilon = 1.0
        self.epsilon_decay = 0.8
        self.epsilon_min = 0.1
        self.learning_rate = 10e-4 
        self.tau = 1e-3
                        
        # Main models
        self.model_uav_pos = self._build_pos_model()

        # Target networks
        self.target_model_uav_pos = self._build_pos_model()
        # Copy weights
        self.target_model_uav_pos.set_weights(self.model_uav_pos.get_weights())

        # An array with last n steps for training
        self.replay_memory_pos_nn = deque(maxlen=REPLAY_MEMORY_SIZE)
        
    def _build_pos_model(self): # compile the DNN
        # create the DNN model
        dnn = self.create_pos_dnn()
        
        opt = Adam(learning_rate=self.learning_rate) #, decay=self.epsilon_decay)
        dnn.compile(loss="categorical_crossentropy", optimizer=opt, metrics=['accuracy'])
        
        return dnn
    
    def create_pos_dnn(self): 
        # initialize the input shape (The shape of an array is the number of elements in each dimension)
        pos_input_shape = (2,)
        requests_input_shape = (len(env.ues),)
        # How many possible outputs we can have
        output_nodes = n_possible_movements
        
        # Initialize the inputs
        uav_current_position = Input(shape=pos_input_shape, name='pos')
        ues_requests = Input(shape=requests_input_shape, name='requests')
        
        # Put them in a list
        list_inputs = [uav_current_position, ues_requests]
        
        # Merge all input features into a single large vector
        x = layers.concatenate(list_inputs)
        
        # Add a 1st Hidden (Dense) Layer
        dense_layer_1 = Dense(512, activation="relu")(x)
        
        # Add a 2nd Hidden (Dense) Layer
        dense_layer_2 = Dense(512, activation="relu")(dense_layer_1)
        
        # Add a 3rd Hidden (Dense) Layer
        dense_layer_3 = Dense(256, activation="relu")(dense_layer_2)
        
        # Output layer
        output_layer = Dense(output_nodes, activation="softmax")(dense_layer_3)

        model = Model(inputs=list_inputs, outputs=output_layer)
                        
        # return the DNN
        return model
    
    def remember_pos_nn(self, state, action, reward, next_state, done):
        self.replay_memory_pos_nn.append((state, action, reward, next_state, done)) 
        
    def act_upon_choosing_a_new_position(self, state): # state is a tuple (uav_position, requests_array)
        if np.random.rand() <= self.epsilon: # if acting randomly, take random action
            return random.randrange(n_possible_movements)
        pos =  np.array([state[0]])
        reqs =  np.array([state[1]])
        act_values = self.model_uav_pos.predict(x=[pos, reqs]) # if not acting randomly, predict reward value based on current state
        return np.argmax(act_values[0]) 
        
    def train_pos_nn(self):
        print("In Training..")

        # Start training only if certain number of samples is already saved
        if len(self.replay_memory_pos_nn) < MIN_REPLAY_MEMORY_SIZE:
            print("Exiting Training: Replay Memory Not Full Enough...")
            return

        # Get a minibatch of random samples from memory replay table
        minibatch = random.sample(self.replay_memory_pos_nn, MINIBATCH_SIZE)

        start_time = time.time()
        # Enumerate our batches
        for index, (current_state, action, reward, new_current_state, done) in enumerate(minibatch):
            print('...Starting Training...')
            target = 0
            pos =  np.array([current_state[0]])
            reqs =  np.array([current_state[1]])
            pos_next = np.array([new_current_state[0]])
            reqs_next = np.array([new_current_state[1]])
    
            if not done:
                target = reward + DISCOUNT * np.amax(self.target_model_uav_pos.predict(x=[pos_next, reqs_next]))
            else:
                target = reward

            # Update Q value for given state
            target_f = self.model_uav_pos.predict(x=[pos, reqs])
            target_f[0][action] = target

            self.model_uav_pos.fit([pos, reqs], \
                                   target_f, \
                                   verbose=2, \
                                   shuffle=False, \
                                   callbacks=None, \
                                   epochs=1 \
                                  )  
        end_time = time.time()
        print("Time", end_time - start_time)
        # Update target network counter every episode
        self.target_train()
        
    def target_train(self):
        weights = self.model_uav_pos.get_weights()
        target_weights = self.target_model_uav_pos.get_weights()
        for i in range(len(target_weights)):
            target_weights[i] = weights[i] * self.tau + target_weights[i] * (1 - self.tau)
        self.target_model_uav_pos.set_weights(target_weights)

# Main 
SIZE = 100 # size of the grid the agent is in
for episode in tqdm(range(1, n_episodes + 1), ascii=True, unit='episodes'):  
    # Reset environment and get initial state
    current_state = env.reset(SIZE)

    # Reset flag and start iterating until episode ends
    done = False
    steps_n = 300

    for t in range(steps_n): 
        # Normalize the input (the current state)
        current_state_normalized = normalize_pos_state(current_state)
        
        # Get new position for the agent
        action_pos = agent_dqn.act_upon_choosing_a_new_position(current_state_normalized)
        
        new_state, reward, done, _ = env.step(action_pos)
        
        agent_dqn.remember_pos_nn(current_state_normalized, action_pos, reward, normalize_pos_state(new_state), done)

        current_state = new_state # not normalized
        
        agent_dqn.train_pos_nn()

    # Decay epsilon
    if episode % 50 == 0:
        if agent_dqn.epsilon > agent_dqn.epsilon_min:
            agent_dqn.epsilon *= agent_dqn.epsilon_decay
            agent_dqn.epsilon = max(agent_dqn.epsilon, agent_dqn.epsilon_min)

Solution

One performance optimization in your training loop is using the call method of a model instead of calling predict, and wrapping it with tf.function. predict is good for batch inference, but there is some overhead, and for single samples, call will likely be faster. Some more details about this difference can be found here. For your purposes, how it might be modified could be:

class DQNAgent(object):

    def _build_pos_model(self): # compile the DNN
        # create the DNN model
        dnn = self.create_pos_dnn()
        
        opt = Adam(learning_rate=self.learning_rate) #, decay=self.epsilon_decay)
        dnn.compile(loss="categorical_crossentropy", optimizer=opt, metrics=['accuracy'])
        dnn.call = tf.function(dnn.call)
        
        return dnn

Then change every call of self.model_uav_pos.predict(..) and self.target_model_uav_pos.predict(...) to self.model_uav_pos(...) and self.target_model_uav_pos(...), respectively.

Further potential optimizations could be to JIT compile the TF function buy supplying jit_compile=True to the tf.function wrapper e.g;

dnn.call = tf.function(dnn.call, jit_compile=True)

Update

It looks like using the call method instead of predict, wrapping the call method in tf.function, and using JIT compilation improved performance over 2x (5s -> 2s), which is an appreciable difference. For further optimizations, although I don't think they will bring you much further down, rather than just wrapping call the other computations after call could be wrapped in tf.function as well, so they all become one callable Tensorflow graph. For example:

        act_values = self.model_uav_pos(x=[pos, reqs]) 
        return np.argmax(act_values[0])

Rather than calling np.argmax afterwards call, we could use tf.argmax, then wrap both in a tf.function. So the revised implementation could be:

class DQNAgent(object):
    def __init__(self):
        #self.gamma = 0.95 
        self.epsilon = 1.0
        self.epsilon_decay = 0.8
        self.epsilon_min = 0.1
        self.learning_rate = 10e-4 
        self.tau = 1e-3
                        
        # Main models
        self.model_uav_pos = self._build_pos_model()
        self.pred_model_uav = tf.function(lambda x: tf.argmax(self.model_uav_pos(x)), jit_compile=True)

        # Target networks
        self.target_model_uav_pos = self._build_pos_model()
        # Copy weights
        self.target_model_uav_pos.set_weights(self.model_uav_pos.get_weights())
        self.pred_target_model_uav = tf.function(lambda x: tf.reduce_max(self.target_model_uav_pos(x)), jit_compile=True)

Then replace every call replaced in the originally proposed solutions with the corresponding new predict methods defined (e.g; instead of self.model_uav_pos(...) call self.pred_model_uav_pos(...)), and remove the numpy function calls after the predictions. Note in this implementation, dnn.call = tf.function(dnn.call) is removed from _build_pos_model, as we're now wrapping later.

The benefit to this approach is with JIT compiling the other computations (argmax and max) that are ultimately applied to the result, additional optimizations can potentially be made to the graph by fusing operations. Some additional details about this idea, along with a simple example of softmax, can be found in here.

As I said, I don't think this will result in a drastic further improvement, but it may shave off some additional time in the loop.

Update 2

I will revise my suggestion from the previous update, as I realized calling model_uav_pos for inference occurs in two places - once in act_upon_choosing_a_new_position where it's followed by the argmax and once in train_pos_nn where just the output is utilized. I would suggest either wrapping call method of model_uav_pos with tf.function after defining self.pred_model_uav, so both inference functions are compiled into Tensorflow graphs:

class DQNAgent(object):
    def __init__(self):
        #self.gamma = 0.95 
        self.epsilon = 1.0
        self.epsilon_decay = 0.8
        self.epsilon_min = 0.1
        self.learning_rate = 10e-4 
        self.tau = 1e-3
                        
        # Main models
        self.model_uav_pos = self._build_pos_model()
        self.pred_model_uav = tf.function(lambda x: tf.argmax(self.model_uav_pos(x)), jit_compile=True)
        self.model_uav_pos.call = tf.function(self.model_uav_pos.call, jit_compile=True)

...

And in the act_upon_choosing_a_new_position method, self.pred_model_uav is used, and in the train_pos_nn method, just call self.model_uav_pos as was detailed in the original solution.

Answered By - danielcahall

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, June 4, 2022

[FIXED] Keras Agent Training Takes Too Much Time

Issue

Solution

Update

Update 2

0 comments:

Post a Comment

Popular Posts

Labels