Issue
I am trying to train a Recurrent Neural Network using Tensorflow (r0.10, python 3.5) on a toy classification problem, but I am getting confusing results.
I want to feed in a sequence of zeros and ones into an RNN, and have the target class for a given element of the sequence to be the number represented by the current and previous values of the sequence, treated as a binary number. For example:
input sequence: [0, 0, 1, 0, 1, 1]
binary digits : [-, [0,0], [0,1], [1,0], [0,1], [1,1]]
target class : [-, 0, 1, 2, 1, 3]
It seems like this is something an RNN should be able to learn quite easily, but instead my model is only able to distinguish classes [0,2] from [1,3]. In other words, it is able to distinguish the classes whose current digit is 0 from those whose current digit is 1. This is leading me to believe that the RNN model is not correctly learning to look at the previous value(s) of the sequence.
There are several tutorials and examples ([1], [2], [3]) that demonstrate how to build and use Recurrent Neural Networks (RNNs) in tensorflow, but after studying them I still do not see my problem (it does not help that all the examples use text as their source data).
I am inputting my data to tf.nn.rnn()
as a list of length T
, whose elements are [batch_size x input_size]
sequences. Since my sequence is one dimensional, input_size
is equal to one, so essentially I believe I am inputting a list of sequences of length batch_size
(the documentation is unclear to me about which dimension is being treated as the time dimension). Is that understanding correct? If that is the case, then I don't understand why the RNN model is not learning correctly.
It's hard to get a small set of code that can run through my full RNN, this is the best I could do (it is mostly adapted from the PTB model here and the char-rnn model here):
import tensorflow as tf
import numpy as np
input_size = 1
batch_size = 50
T = 2
lstm_size = 5
lstm_layers = 2
num_classes = 4
learning_rate = 0.1
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True)
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True)
x = tf.placeholder(tf.float32, [T, batch_size, input_size])
y = tf.placeholder(tf.int32, [T * batch_size * input_size])
init_state = lstm.zero_state(batch_size, tf.float32)
inputs = [tf.squeeze(input_, [0]) for input_ in tf.split(0,T,x)]
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state)
w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w')
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b')
output = tf.concat(0, outputs)
logits = tf.matmul(output, w) + b
probs = tf.nn.softmax(logits)
cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
[logits], [y], [tf.ones_like(y, dtype=tf.float32)]
))
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
10.0)
train_op = optimizer.apply_gradients(zip(grads, tvars))
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
curr_state = sess.run(init_state)
for i in range(3000):
# Create toy data where the true class is the value represented
# by the current and previous value treated as binary, i.e.
train_x = np.random.randint(0,2,(T * batch_size * input_size))
train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2)))
# Reshape into T x batch_size x input_size
train_x = np.reshape(train_x, (T, batch_size, input_size))
feed_dict = {
x: train_x, y: train_y
}
for j, (c, h) in enumerate(init_state):
feed_dict[c] = curr_state[j].c
feed_dict[h] = curr_state[j].h
fetch_dict = {
'cost': cost, 'final_state': final_state, 'train_op': train_op
}
# Evaluate the graph
fetches = sess.run(fetch_dict, feed_dict=feed_dict)
curr_state = fetches['final_state']
if i % 300 == 0:
print('step {}, train cost: {}'.format(i, fetches['cost']))
# Test
test_x = np.array([[0],[0],[1],[0],[1],[1]]*(T*batch_size*input_size))
test_x = test_x[:(T*batch_size*input_size),:]
probs_out = sess.run(probs, feed_dict={
x: np.reshape(test_x, [T, batch_size, input_size]),
init_state: curr_state
})
# Get the softmax outputs for the points in the sequence
# that have [0, 0], [0, 1], [1, 0], [1, 1] as their
# last two values.
for i in [1, 2, 3, 5]:
print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
[1, 2, 3, 5].index(i), *list(probs_out[i,:]))
)
The final output here is
0: [0.4899 0.0007 0.5080 0.0014]
1: [0.0003 0.5155 0.0009 0.4833]
2: [0.5078 0.0011 0.4889 0.0021]
3: [0.0003 0.5052 0.0009 0.4936]
which indicates that it is only learning to distinguish [0,2] from [1,3]. Why isn't this model learning to use the previous value in the sequence?
Solution
Figured it out, with the help of this blog post (it has wonderful diagrams of the input tensors). It turns out that I was not understanding the shape of the inputs to tf.nn.rnn()
correctly:
Let's say you've got batch_size
number of sequences. Each sequence has input_size
dimensions and has length T
(these names were chosen to match the documentation of tf.nn.rnn()
here). Then you need to split your input into a T
-length list where each element has shape batch_size x input_size
. This means that your contiguous sequence will be spread out across the elements of the list. I thought that contiguous sequences would be kept together so that each element of the list inputs
would be an example of one sequence.
This makes sense in retrospect, since we wish to parallelize each step through the sequence, so we want to run do the first step of each sequence (first element in list), then second step of each sequence (second element in list), etc.
Working version of the code:
import tensorflow as tf
import numpy as np
sequence_size = 50
batch_size = 7
num_features = 1
lstm_size = 5
lstm_layers = 2
num_classes = 4
learning_rate = 0.1
lstm = tf.nn.rnn_cell.BasicLSTMCell(lstm_size, state_is_tuple=True)
lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * lstm_layers, state_is_tuple=True)
x = tf.placeholder(tf.float32, [batch_size, sequence_size, num_features])
y = tf.placeholder(tf.int32, [batch_size * sequence_size * num_features])
init_state = lstm.zero_state(batch_size, tf.float32)
inputs = [tf.squeeze(input_, [1]) for input_ in tf.split(1,sequence_size,x)]
outputs, final_state = tf.nn.rnn(lstm, inputs, initial_state=init_state)
w = tf.Variable(tf.truncated_normal([lstm_size, num_classes]), name='softmax_w')
b = tf.Variable(tf.truncated_normal([num_classes]), name='softmax_b')
output = tf.reshape(tf.concat(1, outputs), [-1, lstm_size])
logits = tf.matmul(output, w) + b
probs = tf.nn.softmax(logits)
cost = tf.reduce_mean(tf.nn.seq2seq.sequence_loss_by_example(
[logits], [y], [tf.ones_like(y, dtype=tf.float32)]
))
# Now optimize on that cost
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
10.0)
train_op = optimizer.apply_gradients(zip(grads, tvars))
init = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init)
curr_state = sess.run(init_state)
for i in range(3000):
# Create toy data where the true class is the value represented
# by the current and previous value treated as binary, i.e.
train_x = np.random.randint(0,2,(batch_size * sequence_size * num_features))
train_y = train_x + np.concatenate(([0], (train_x[:-1] * 2)))
# Reshape into T x batch_size x sequence_size
train_x = np.reshape(train_x, [batch_size, sequence_size, num_features])
feed_dict = {
x: train_x, y: train_y
}
for j, (c, h) in enumerate(init_state):
feed_dict[c] = curr_state[j].c
feed_dict[h] = curr_state[j].h
fetch_dict = {
'cost': cost, 'final_state': final_state, 'train_op': train_op
}
# Evaluate the graph
fetches = sess.run(fetch_dict, feed_dict=feed_dict)
curr_state = fetches['final_state']
if i % 300 == 0:
print('step {}, train cost: {}'.format(i, fetches['cost']))
# Test
test_x = np.array([[0],[0],[1],[0],[1],[1]]*(batch_size * sequence_size * num_features))
test_x = test_x[:(batch_size * sequence_size * num_features),:]
probs_out = sess.run(probs, feed_dict={
x: np.reshape(test_x, [batch_size, sequence_size, num_features]),
init_state: curr_state
})
# Get the softmax outputs for the points in the sequence
# that have [0, 0], [0, 1], [1, 0], [1, 1] as their
# last two values.
for i in [1, 2, 3, 5]:
print('{}: [{:.4f} {:.4f} {:.4f} {:.4f}]'.format(
[1, 2, 3, 5].index(i), *list(probs_out[i,:]))
)
Answered By - kbrose
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.