Thursday, February 1, 2024

[FIXED] AWS SageMaker - How to load trained sklearn model to serve for inference?

February 01, 2024 amazon-s3, amazon-sagemaker, amazon-web-services, scikit-learn No comments

Issue

I am trying to deploy a model trained with sklearn to an endpoint and serve it as an API for predictions. All I want to use sagemaker for, is to deploy and server model I had serialised using joblib, nothing more. every blog I have read and sagemaker python documentation showed that sklearn model had to be trained on sagemaker in order to be deployed in sagemaker.

When I was going through the SageMaker documentation I learned that sagemaker does let users load a serialised model stored in S3 as shown below:

def model_fn(model_dir):
    clf = joblib.load(os.path.join(model_dir, "model.joblib"))
    return clf

And this is what documentation says about the argument model_dir:

SageMaker will inject the directory where your model files and sub-directories, saved by save, have been mounted. Your model function should return a model object that can be used for model serving.

This again means that training has to be done on sagemaker.

So, is there a way I can just specify the S3 location of my serialised model and have sagemaker de-serialise(or load) the model from S3 and use it for inference?

EDIT 1:

I used code in the answer to my application and I got below error when trying to deploy from notebook of SageMaker studio. I believe SageMaker is screaming that training wasn't done on SageMaker.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-6662bbae6010> in <module>
      1 predictor = model.deploy(
      2     initial_instance_count=1,
----> 3     instance_type='ml.m4.xlarge'
      4 )

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, use_compiled_model, wait, model_name, kms_key, data_capture_config, tags, **kwargs)
    770         """
    771         removed_kwargs("update_endpoint", kwargs)
--> 772         self._ensure_latest_training_job()
    773         self._ensure_base_job_name()
    774         default_name = name_from_base(self.base_job_name)

/opt/conda/lib/python3.7/site-packages/sagemaker/estimator.py in _ensure_latest_training_job(self, error_message)
   1128         """
   1129         if self.latest_training_job is None:
-> 1130             raise ValueError(error_message)
   1131 
   1132     delete_endpoint = removed_function("delete_endpoint")

ValueError: Estimator is not associated with a training job

My code:

import sagemaker
from sagemaker import get_execution_role
# from sagemaker.pytorch import PyTorchModel
from sagemaker.sklearn import SKLearn
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

sm_role = sagemaker.get_execution_role()  # IAM role to run SageMaker, access S3 and ECR

model_file = "s3://sagemaker-manual-bucket/sm_model_artifacts/model.tar.gz"   # Must be ".tar.gz" suffix

class AnalysisClass(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super().__init__(
            endpoint_name,
            sagemaker_session=sagemaker_session,
            serializer=json_serializer,
            deserializer=json_deserializer,   # To be able to use JSON serialization
            content_type='application/json'   # To be able to send JSON as HTTP body
        )

model = SKLearn(model_data=model_file,
                entry_point='inference.py',
                name='rf_try_1',
                role=sm_role,
                source_dir='code',
                framework_version='0.20.0',
                instance_count=1,
                instance_type='ml.m4.xlarge',
                predictor_cls=AnalysisClass)
predictor = model.deploy(initial_instance_count=1,
                         instance_type='ml.m4.xlarge')

Solution

Yes you can. AWS documentation focuses on end-to-end from training to deployment in SageMaker which makes the impression that training has to be done on sagemaker. AWS documentation and examples all in all are so poorly written unfortunately. They should have clear separation among Training in Estimator, Saving and loading model, and Deployment model to SageMaker Endpoint, but they do not.

SageMaker Model

You need to create the AWS::SageMaker::Model resource which refers to the "model" you have trained and more. AWS::SageMaker::Model is in CloudFormation document but it is only to explain what AWS resource you need.

CreateModel API creates a SageMaker model resource. The parameters specifie the docker image to use, model location in S3, IAM role to use, etc. See How SageMaker Loads Your Model Artifacts.

Docker image

Obviously you need the framework e.g. ScikitLearn, TensorFlow, PyTorch, etc that you used to train your model to get inferences. You need a docker image that has the framework, and HTTP front end to respond to the prediction calls. See SageMaker Inference Toolkit and Using the SageMaker Training and Inference Toolkits.

To build the image is not easy. Hence AWS provides pre-built images called AWS Deep Learning Containers and available images are in Github.

If your framework and the version is listed there, you can use it as the image. Otherwise you need to build by yourself. See Building a docker container for training/deploying our classifier.

SageMaker Python SDK for Frameworks

AWS SageMaker Python SDK provids utilities to create the SageMaker models for several frameworks. See Frameworks for available frameworks. If it is not there, you may still be able to use sagemaker.model.FrameworkModel and Model to load your trained model. For your case, see Using Scikit-learn with the SageMaker Python SDK.

model.tar.gz

The saved model artifact needs to be packaged into model.tar.gz. Each framework (e.g. TensorFlow, PyTorch, etc) has its own structure to comply with.

Suppose you used PyTorch and saved the model as model.pth, and the inference.py is the code for prediction, then the structure inside the model.tar.gz is explained Model Directory Structure section in the SDK documentation for PyTorch framework. See Create the directory structure for your model files.

|- model.pth        # model file is inside / directory.
|- code/            # Code artefacts must be inside /code
  |- inference.py   # Your inference code for the framework
  |- requirements.txt  # only for versions 1.3.1 and higher. Name must be "requirements.txt"

If you use TensorFlow, you need to look at SageMaker TensorFlow Serving Container. As common with AWS, the documentation is not clear and not always up-to-date.

If you use Windows, beware of the CRLF to LF as AWS SageMaker runs in *NIX environment.

Save the model.tar.gz file in S3. Make sure that SageMaker execution role has the IAM permission to access the S3 bucket and objects.

Loading model and get inference

See Create a PyTorchModel object. When instantiating the PyTorchModel class, SageMaker automatically selects the AWS Deep Learning Container image for PyTorch for the version specified in framework_version. If the image for the version does not exist, then it fails. This has not been documented in AWS but need to be aware of. SageMaker then internally calls the CreateModel API with the S3 model file location and the AWS Deep Learning Container image URL.

import sagemaker
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorchModel
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

role = sagemaker.get_execution_role()  # IAM role to run SageMaker, access S3 and ECR
model_file = "s3://YOUR_BUCKET/YOUR_FOLDER/model.tar.gz"   # Must be ".tar.gz" suffix


class AnalysisClass(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super().__init__(
            endpoint_name,
            sagemaker_session=sagemaker_session,
            serializer=json_serializer,
            deserializer=json_deserializer,   # To be able to use JSON serialization
            content_type='application/json'   # To be able to send JSON as HTTP body
        )

model = PyTorchModel(
    model_data=model_file,
    name='YOUR_MODEL_NAME_WHATEVER',
    role=role,
    entry_point='inference.py',
    source_dir='code',              # Location of the inference code
    framework_version='1.5.0',      # Availble AWS Deep Learning PyTorch container version must be specified
    predictor_cls=AnalysisClass     # To specify the HTTP request body format (application/json)
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.xlarge'
)

test_data = {"body": "YOUR PREDICTION REQUEST"}
prediction = predictor.predict(test_data)

By default, SageMaker uses NumPy as the serialization format. To be able to use JSON, need to specify the serializer and content_type. Instead of using RealTimePredictor class, you can specify them to predictor.

predictor.serializer=json_serializer
predictor.predict(test_data)

predictor.serializer=None # As the serializer is None, predictor won't serialize the data
serialized_test_data=json.dumps(test_data) 
predictor.predict(serialized_test_data)

Inference code sample

See Process Model Input, Get Predictions from a PyTorch Model and Process Model Output. The prediction request is sent as JSON in HTTP request body in this example.

import os
import sys
import datetime
import json
import torch
import numpy as np

CONTENT_TYPE_JSON = 'application/json'

def model_fn(model_dir):
    # SageMaker automatically load the model.tar.gz from the S3 and 
    # mount the folders inside the docker container. The  'model_dir'
    # points to the root of the extracted tar.gz file.

    model_path = f'{model_dir}/'
    
    # Load the model
    # You can load whatever from the Internet, S3, wherever <--- Answer to your Question
    # NO Need to use the model in tar.gz. You can place a dummy model file.
    ...

    return model


def predict_fn(input_data, model):
    # Do your inference
    ...

def input_fn(serialized_input_data, content_type=CONTENT_TYPE_JSON):
    input_data = json.loads(serialized_input_data)
    return input_data


def output_fn(prediction_output, accept=CONTENT_TYPE_JSON):
    if accept == CONTENT_TYPE_JSON:
        return json.dumps(prediction_output), accept
    raise Exception('Unsupported content type')

Using Models Trained Outside of Amazon SageMaker
TensorFlow Bring Your Own Model: Train locally and deploy on SageMaker.
Example to deploy a TensorFlow trained model to the SageMaker Endpoint.

Note

SageMaker team keeps changing the implementations and the documentations are not updated accordingly or too disorganized to understand. When you are sure you did follow the documents and it does not work, obsolete or incorrect documentations are quite likely. In such cases, need to clarify with the AWS support, or open an issue in the Github.

Answered By - mon

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0