Issue
Is there a fast way to do serialization of a DataFrame?
I have a grid system which can run pandas analysis in parallel. In the end, I want to collect all the results (as a DataFrame) from each grid job and aggregate them into a giant DataFrame.
How can I save data frame in a binary format that can be loaded rapidly?
Solution
The easiest way is just to use to_pickle (as a pickle), see pickling from the docs api page:
df.to_pickle(file_name)
Another option is to use HDF5 (built on PyTables). It is slightly more work to get started but much richer for querying.
Answered By - Andy Hayden
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.