Issue
I have this path in S3: object1/object2/object3/object4/
In Object4/
I have a list of objects, example:
directory1/directory2/directory3/directory4/2022-30-09-15h21/
directory1/directory2/directory3/directory4/2023-20-12-12h30/
directory1/directory2/directory3/directory4/2022-31-12-09h34/
directory1/directory2/directory3/directory4/2023-12-08-14h56/
I would like to select the last created directory in directory4/
then I should download all files inside it.
I wrote this script to do it:
import boto3
from datetime import datetime
session_root = boto3.Session(region_name='eu-west-3', profile_name='my_profile')
s3_client = session_root.client('s3')
bucket_name = 'my_bucket'
prefix = 'object1/object2/object3/object4/'
# List objects in the bucket
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=prefix)
# Extract the object names and convert them to datetime objects
objects_with_dates = [(obj['Key'], datetime.strptime(obj['LastModified'].strftime('%Y-%m-%d %H:%M:%S'), '%Y-%m-%d %H:%M:%S')) for obj in response.get('Contents', [])]
# Find the latest created object
latest_object = max(objects_with_dates, key=lambda x: x[1])
print("Last created S3 object:", latest_object[0]) # the returned value is: object1/object2/object3/object4/2023-20-12-12h30/my_file.csv
My script select the last created directory in directory4/
and download the last created file inside, the result of my script is: directory1/directory2/directory3/directory4/2023-20-12-12h30/my_file.csv
But I would like to download all files inside.
Do you have an idea please how can I modify my script to select the last created directory in directory4/
and I download all files inside ?
Thanks
Solution
It appears that your requirement is:
- List all sub-directories for a given prefix (eg all sub-directories under
directory1/directory2/directory3/directory4/
) - Of those sub-directories, find the sub-directory that represents the latest date by using the name of the subdirectory that includes a timestamp in
YYYY-DD-MM-HHhmm
format - Download all the objects in that sub-directory
Here is a sample program that uses the list of CommonPrefixes
returned by S3, which is effectively a list of sub-directories.
import boto3
BUCKET = 'my-bucket'
PREFIX = 'directory1/directory2/directory3/directory4/'
# Custom date sorter to handle YYYY-DD-MM-HHhmm format
def date_sorter(date):
date_parts = date.split('-')
return (date_parts[0], date_parts[2], date_parts[1], date_parts[3])
# Obtain a list of CommonPrefixes in the given Bucket and Prefix
# Use a paginator in case there are more than 1000 objects
s3_client = boto3.client('s3')
paginator = s3_client.get_paginator('list_objects_v2')
result = paginator.paginate(Bucket=BUCKET, Delimiter='/', Prefix=PREFIX)
# Get the 'latest' CommonPrefix but it is in the format YYYY-DD-MM-HHhmm
prefixes = [item['Prefix'] for item in result.search('CommonPrefixes')]
latest_prefix = sorted(prefixes, key=date_sorter)[-1]
# Download all objects from that prefix
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket(BUCKET).objects.filter(Prefix=latest_prefix):
# Download to local directory using just the filename
filename = object.key.split('/')[-1]
print(f'Downloading {object.key}')
object.Object().download_file(filename)
Answered By - John Rotenstein
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.