Issue
Below is the code that I ran:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 5))
df.columns = ['a', 'b', 'c', 'd', 'e']
df['p'] = 2
df.to_parquet('s3://my_bucket/test01/boo.parquet', engine='fastparquet', compression='gzip', partition_cols=['p'])
The parquet is saved to s3. But at my working dir, i now have a dir called "s3:", which has the full structure interpreted from the s3 url.
Solution
Ok, i realize that this is a fastparquet quirk.
This only happens if partition_cols is provided and engine='fastparquet'. If no partition_cols is provided, or if I use default engine (which is engine='pyarrow'), this empty dir artifact will not appear. It just looks like a weird quirk with fastparquet.
Answered By - michaelgbj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.