Friday, December 1, 2023

[FIXED] How to write parquet file from pandas dataframe in S3 in python

December 01, 2023 amazon-s3, parquet, python-3.x No comments

Issue

I have a pandas dataframe. i want to write this dataframe to parquet file in S3. I need a sample code for the same.I tried to google it. but i could not get a working sample code.

Solution

First ensure that you have pyarrow or fastparquet installed with pandas.

Then install boto3 and aws cli. Use aws cli to set up the config and credentials files, located at .aws folder.

Here is a simple script using pyarrow, and boto3 to create a temporary parquet file and then send to AWS S3.

Sample code excluding imports:

import pyarrow as pa
import pyarrow.parquet as pq

def main():
    data = {0: {"data1": "value1"}}
    df = pd.DataFrame.from_dict(data, orient='index')
    write_pandas_parquet_to_s3(
        df, "bucket", "folder/test/file.parquet", ".tmp/file.parquet")


def write_pandas_parquet_to_s3(df, bucketName, keyName, fileName):
    # dummy dataframe
    table = pa.Table.from_pandas(df)
    pq.write_table(table, fileName)

    # upload to s3
    s3 = boto3.client("s3")
    BucketName = bucketName
    with open(fileName) as f:
       object_data = f.read()
       s3.put_object(Body=object_data, Bucket=BucketName, Key=keyName)

Answered By - andreas

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Friday, December 1, 2023

[FIXED] How to write parquet file from pandas dataframe in S3 in python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels