Thursday, August 25, 2022

[FIXED] Read csv from Azure blob Storage and store in a DataFrame

August 25, 2022 azure-blob-storage, pandas, python, python-3.x No comments

Issue

I'm trying to read multiple CSV files from blob storage using python.

The code that I'm using is:

blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs(folder_root)
for blob in blobs_list:
    blob_client = blob_service_client.get_blob_client(container=container, blob="blob.name")
    stream = blob_client.download_blob().content_as_text()

I'm not sure what is the correct way to store the CSV files read in a pandas dataframe.

I tried to use:

df = df.append(pd.read_csv(StringIO(stream)))

But this shows me an error.

Any idea how can I to do this?

Solution

You could download the file from blob storage, then read the data into a pandas DataFrame from the downloaded file.

from azure.storage.blob import BlockBlobService
import pandas as pd
import tables

STORAGEACCOUNTNAME= <storage_account_name>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

#download from blob
t1=time.time()
blob_service=BlockBlobService(account_name=STORAGEACCOUNTNAME,account_key=STORAGEACCOUNTKEY)
blob_service.get_blob_to_path(CONTAINERNAME,BLOBNAME,LOCALFILENAME)
t2=time.time()
print(("It takes %s seconds to download "+blobname) % (t2 - t1))

# LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(LOCALFILENAME)

For more details, see here.

If you want to do the conversion directly, the code will help. You need to get content from the blob object and in the get_blob_to_text there's no need for the local file name.

from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME).content
df = pd.read_csv(StringIO(blobstring))

Answered By - unknown

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, August 25, 2022

[FIXED] Read csv from Azure blob Storage and store in a DataFrame

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels