Issue
I've been looking around here and on the Internet, but it seems that I'm the first one having this question.
I'd like to train an ML model (let's say something with PyTorch) and write it to an Apache Kafka cluster. On the other side, there should be the possibility of loading the model again from the received array of bytes. It seems that almost all the frameworks only offer methods to load from a path, so a file.
The only constraint I'm trying to satisfy is to not save the model as a file, so I won't need a storage.
Am I missing something? Do you have any idea how to solve it?
Solution
One reason to avoid this is that Kafka messages have a default of 1MB max. Therefore sending models around in topics wouldn't be the best idea, and therefore why you could instead use model files, stored in a shared filesystem, and send URIs to the files (strings) to download in the consumer clients.
For small model files, there is nothing preventing you from dumping the Kafka record bytes to a local file, but if you happen to change the model input parameters, then you'd need to edit the consumer code, anyway.
Or you can embed the models in other stream processing engines (still on local filesystems), as linked in the comments.
Answered By - OneCricketeer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.