Issue
I'm trying to read multiple CSV files from blob storage using python.
The code that I'm using is:
blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs(folder_root)
for blob in blobs_list:
blob_client = blob_service_client.get_blob_client(container=container, blob="blob.name")
stream = blob_client.download_blob().content_as_text()
I'm not sure what is the correct way to store the CSV files read in a pandas dataframe.
I tried to use:
df = df.append(pd.read_csv(StringIO(stream)))
But this shows me an error.
Any idea how can I to do this?
Solution
You could download the file from blob storage, then read the data into a pandas DataFrame from the downloaded file.
from azure.storage.blob import BlockBlobService
import pandas as pd
import tables
STORAGEACCOUNTNAME= <storage_account_name>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>
#download from blob
t1=time.time()
blob_service=BlockBlobService(account_name=STORAGEACCOUNTNAME,account_key=STORAGEACCOUNTKEY)
blob_service.get_blob_to_path(CONTAINERNAME,BLOBNAME,LOCALFILENAME)
t2=time.time()
print(("It takes %s seconds to download "+blobname) % (t2 - t1))
# LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(LOCALFILENAME)
For more details, see here.
If you want to do the conversion directly, the code will help. You need to get content from the blob object and in the get_blob_to_text
there's no need for the local file name.
from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME).content
df = pd.read_csv(StringIO(blobstring))
Answered By - unknown
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.