Issue
I am implementing an anomaly detection web service using MLflow
and sklearn.pipeline.Pipeline()
. The aim of the model is to detect web crawlers using server log and response_length
column is one of my features. After serving model, for testing the web service I send below request that contains the 20 first columns of the train data.
$ curl --location --request POST '127.0.0.1:8000/invocations'
--header 'Content-Type: text/csv' \
--data-binary 'datasets/test.csv'
But response of the web server has status code 400 (BAD REQUEST) and this JSON body:
{
"error_code": "BAD_REQUEST",
"message": "Incompatible input types for column response_length. Can not safely convert float64 to <U0."
}
Here is the model compilation MLflow Tracking component log:
[Pipeline] ......... (step 1 of 3) Processing transform, total=11.8min
[Pipeline] ............... (step 2 of 3) Processing pca, total= 4.8s
[Pipeline] ........ (step 3 of 3) Processing rule_based, total= 0.0s
2021/07/16 04:55:12 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2021/07/16 04:55:12 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/matin/workspace/Rahnema College/venv/lib/python3.8/site-packages/mlflow/models/signature.py:129: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
Logged data and model in run: 8843336f5c31482c9e246669944b1370
---------- logged params ----------
{'memory': 'None',
'pca': 'PCAEstimator()',
'rule_based': 'RuleBasedEstimator()',
'steps': "[('transform', <log_transformer.LogTransformer object at "
"0x7f05a8b95760>), ('pca', PCAEstimator()), ('rule_based', "
'RuleBasedEstimator())]',
'transform': '<log_transformer.LogTransformer object at 0x7f05a8b95760>',
'verbose': 'True'}
---------- logged metrics ----------
{}
---------- logged tags ----------
{'estimator_class': 'sklearn.pipeline.Pipeline', 'estimator_name': 'Pipeline'}
---------- logged artifacts ----------
['model/MLmodel',
'model/conda.yaml',
'model/model.pkl',
'model/requirements.txt']
Could anyone tell me exactly how I can fix this model serve problem?
Solution
The problem caused by mlflow.utils.autologging_utils
WARNING.
When the model is created, data input signature is saved on the MLmodel
file with some.
You should change response_length
signature input type from string
to double
by replacing
{"name": "response_length", "type": "double"}
instead of
{"name": "response_length", "type": "string"}
so it doesn't need to be converted. After serving the model with edited MLmodel
file, the web server worked as expected.
Answered By - Matin Zivdar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.