Issue
I wanted to compare memory consumption for same dataset. I read same SQL query with pandas and polars from an Oracle DB. Memory usage results are almost same. and execution time is 2 times faster than polars. I expect polars will be more memory efficient.
Is there anyone who can explain this? And any suggestion to reduce memory usage size for same dataset?
result(polars) and data(pandas) shapes:
and lastly memory usages:
Solution
One of the big advantages of Polars is query optimisation
If you're loading all data into memory with read_database
, and only doing that, then there will be no difference
On the other hand, if you make the dataframe you read in lazy (DataFrame.lazy
), then perform some other operations, and then collect the results (LazyFrame.collect
), then that's where you'll see the Polars shine
Note: usually you'll want to read the data in lazily directly (e.g. scan_parquet
instead of read_parquet
) but for read_database
there is no scan_
equivalent
Answered By - ignoring_gravity
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.