Issue
I am trying to use a function I found from this previous question Reading multiple csv files from S3 bucket with boto3 But I keep getting ValueError: DataFrame constructor not properly called!
This is the code below:
s3 = boto3.resource('s3',aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
bucket = s3.Bucket('test_bucket')
prefix_objs = bucket.objects.filter(Prefix=prefix)
prefix_df = []
for obj in prefix_objs:
key = obj.key
body = obj.get()['Body'].read()
df = pd.DataFrame(body)
When I print body all I get is a bunch of string starting with a b'
Solution
I use this and it works well if all your files are in 1 prefix path. Basically you create the s3 client then iterate over each object in the prefix path followed by appending each file to an empty list for the concatenation via pandas.
import boto3
import pandas as pd
s3 = boto3.client("s3",\
region_name=region_name,\
aws_access_key_id=aws_access_key_id,\
aws_secret_access_key=aws_secret_access_key)
response = s3.list_objects(Bucket="my-bucket",\
Prefix="datasets/")
df_list = []
for file in response["Contents"]:
obj = s3.get_object(Bucket="my-bucket", Key=file["Key"])
obj_df = pd.read_csv(obj["Body"])
df_list.append(obj_df)
df = pd.concat(df_list)
Answered By - thePurplePython
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.