Issue
I'm following this example notebook to learn SageMaker's processing jobs API: https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
I'm trying to modify their code to avoid using the default S3 bucket, namely: s3://sagemaker-<region>-<account_id>/
For their data processing step with the .run
method:
from sagemaker.processing import ProcessingInput, ProcessingOutput
sklearn_processor.run(
code="preprocessing.py",
inputs=[ProcessingInput(source=input_data, destination="/opt/ml/processing/input")],
outputs=[
ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"),
],
arguments=["--train-test-split-ratio", "0.2"],
)
I was able to modify it to use my own S3 bucket by using the destination
parameter like this:
sklearn_processor.run(
code=output_bucket_uri + "preprocessing.py",
inputs=[ProcessingInput(
source=input_bucket_uri + "census-income.csv",
destination=path+"input/",
)],
outputs=[
ProcessingOutput(
output_name="train_data",
source=path+"train/",
destination=output_bucket_uri + "train/",
),
ProcessingOutput(
output_name="test_data",
source=path+"test/",
destination=output_bucket_uri + "test/",
),
],
arguments=["--train-test-split-ratio", "0.2"],
)
But for the .fit
method:
sklearn.fit({"train": preprocessed_training_data})
I have not been able to find a parameter to pass it so that the output artifacts are saved to a S3 bucket that I specify instead of the default s3 bucket s3://sagemaker-<region>-<account_id>/
.
Solution
You specify the output artifacts' bucket when you create the SKLearn
estimator.
SKLearn
is a subclass of Framework
which is a subclass of EstimatorBase
, which has an output_path
argument.
Below a snippet from Sagemaker Examples where they are using the Pytorch
estimator but it's the same idea:
est = PyTorch(
entry_point="train.py",
source_dir="code", # directory of your training script
role=role,
framework_version="1.5.0",
py_version="py3",
instance_type=instance_type,
instance_count=1,
output_path=output_path,
hyperparameters={"batch-size": 128, "epochs": 1, "learning-rate": 1e-3, "log-interval": 100},
)
est.fit(...)
Docs:
Answered By - Murilo Cunha
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.