Issue
I am currently modifyiong this gist to save the state of the neural network using numpy .npz files. The problematic code uses the variables:
Wxh = np.random.randn(hidden_size, vocab_size) * 0.01 # input to hidden
Whh = np.random.randn(hidden_size, hidden_size) * 0.01 # hidden to hidden
Why = np.random.randn(vocab_size, hidden_size) * 0.01 # hidden to output
bh = np.zeros((hidden_size, 1)) # hidden bias
by = np.zeros((vocab_size, 1)) # output bias
they are, of course, changed while running the network. Now after having saved them through:
memFile = open(memoryFileName, "w")
np.savez(memFile, Wxh, Whh, Why, bh, by)
memFile.close()
I tried to test opening them by:
loaded = np.load(getfile_local(memoryFileName))
Wxh, Whh, Why, bh, by = 0,0,0,0,0
varList = [Wxh, Whh, Why, bh, by]
for index, name in enumerate(loaded.files):
print loaded[name]
running this raised the following exception:
Traceback (most recent call last):
File "D:/Python27/projects/poemGen/charRNN/char_rnn.py", line 58, in <module>
testDict.update(loaded)
File "D:\Python27\lib\site-packages\numpy\lib\npyio.py", line 224, in __getitem__
pickle_kwargs=self.pickle_kwargs)
File "D:\Python27\lib\site-packages\numpy\lib\format.py", line 664, in read_array
data = _read_bytes(fp, read_size, "array data")
File "D:\Python27\lib\site-packages\numpy\lib\format.py", line 803, in _read_bytes
r = fp.read(size - len(data))
File "D:\Python27\lib\zipfile.py", line 632, in read
data = self.read1(n - len(buf))
File "D:\Python27\lib\zipfile.py", line 672, in read1
self._update_crc(data, eof=(self._compress_left==0))
File "D:\Python27\lib\zipfile.py", line 647, in _update_crc
raise BadZipfile("Bad CRC-32 for file %r" % self.name)
zipfile.BadZipfile: Bad CRC-32 for file 'arr_1.npy'
As far as I understand, it seems like this is either a problem within numpy or zipfile, but I'd be glad to hear that it was my mistake.^^
Solution
by doing this:
memFile = open(memoryFileName, "w")
you're passing a text-mode handle to a method that writes binary data. Since Python 2 doesn't care about binary or text and you're running Windows, the write operation "corrupts" the file:
when a linefeed (ASCII: 10) is encountered, the text mode automatically prepends a carriage return (ASCII: 13), thus changing the binary contents of the file. When you use the proper load method, the file is opened in binary mode, and the carriage return characters change the checksum.
On unix-like systems the problem doesn't appear because text equals binary (no end-of-line conversion is done behind the scenes)
Fix:
memFile = open(memoryFileName, "wb")
Note that python 3 makes a difference between text & binary streams and the problem could have been detected more easily.
Answered By - Jean-François Fabre
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.