Issue
I had a question that I can't find any answers to online. I have trained a model whose checkpoint file is about 20 GB. Since I do not have enough RAM with my system (or Colaboratory/Kaggle either - the limit being 16 GB), I can't use my model for predictions.
I know that the model has to be loaded into memory for the inferencing to work. However, is there a workaround or a method that can:
- Save some memory and be able to load it in 16 GB of RAM (for CPU), or the memory in the TPU/GPU
- Can use any framework (since I would be working with both) TensorFlow + Keras, or PyTorch (which I am using right now)
Is such a method even possible to do in either of these libraries? One of my tentative solutions was not load it in chunks perhaps, essentially maintaining a buffer for the model weights and biases and performing calculations accordingly - though I haven't found any implementations for that.
I would also like to add that I wouldn't mind the performance slowdown since it is to be expected with low-specification hardware. As long as it doesn't take more than two weeks :) I can definitely wait that long...
Solution
Yoy can try the following:
- split model by two parts
- load weights to the both parts separately calling
model.load_weights(by_name=True)
- call the first model with your input
- call the second model with the output of the first model
Answered By - Andrey
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.