Issue
I have a pandas dataframe
containing large volumes of text in each row and it takes up 1.6GB of space when converted to .pkl
. Now I want to make a list of words from this dataframe, and I thought that something as simple as [word for text in df.text for word in i.split()]
should suffice, however, this expression eats up all 16GB of ram in 10 seconds and that's it. It is really interesting to me how that works, why is it not just above 1.6GB? I know that lists allocate a little more memory to be able to expand, so I have tried tuples - the same result. I even tried writing everything into a file as tuples ('one', 'two', 'three')
and then opening the file and doing eval
- still the same result. Why does that happen? Does pandas compress data or is python that inefficient? What is a better way to do it?
Solution
You can use a generator. For example map(func, iterable)
Answered By - Saliou DJIBRILA
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.