Tuesday, July 26, 2022

[FIXED] MLflow webserver returns 400 status, "Incompatible input types for column X. Can not safely convert float64 to <U0."

July 26, 2022 mlflow, scikit-learn, webserver No comments

Issue

I am implementing an anomaly detection web service using MLflow and sklearn.pipeline.Pipeline(). The aim of the model is to detect web crawlers using server log and response_length column is one of my features. After serving model, for testing the web service I send below request that contains the 20 first columns of the train data.

$ curl  --location --request POST '127.0.0.1:8000/invocations'
        --header 'Content-Type: text/csv' \
        --data-binary 'datasets/test.csv'

But response of the web server has status code 400 (BAD REQUEST) and this JSON body:

{
    "error_code": "BAD_REQUEST",
    "message": "Incompatible input types for column response_length. Can not safely convert float64 to <U0."
}

Here is the model compilation MLflow Tracking component log:

[Pipeline] ......... (step 1 of 3) Processing transform, total=11.8min
[Pipeline] ............... (step 2 of 3) Processing pca, total=   4.8s
[Pipeline] ........ (step 3 of 3) Processing rule_based, total=   0.0s
2021/07/16 04:55:12 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2021/07/16 04:55:12 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/matin/workspace/Rahnema College/venv/lib/python3.8/site-packages/mlflow/models/signature.py:129: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
Logged data and model in run: 8843336f5c31482c9e246669944b1370

---------- logged params ----------
{'memory': 'None',
 'pca': 'PCAEstimator()',
 'rule_based': 'RuleBasedEstimator()',
 'steps': "[('transform', <log_transformer.LogTransformer object at "
          "0x7f05a8b95760>), ('pca', PCAEstimator()), ('rule_based', "
          'RuleBasedEstimator())]',
 'transform': '<log_transformer.LogTransformer object at 0x7f05a8b95760>',
 'verbose': 'True'}

---------- logged metrics ----------
{}

---------- logged tags ----------
{'estimator_class': 'sklearn.pipeline.Pipeline', 'estimator_name': 'Pipeline'}

---------- logged artifacts ----------
['model/MLmodel',
 'model/conda.yaml',
 'model/model.pkl',
 'model/requirements.txt']

Could anyone tell me exactly how I can fix this model serve problem?

Solution

The problem caused by mlflow.utils.autologging_utils WARNING.

When the model is created, data input signature is saved on the MLmodel file with some. You should change response_length signature input type from string to double by replacing

{"name": "response_length", "type": "double"}

instead of

{"name": "response_length", "type": "string"}

so it doesn't need to be converted. After serving the model with edited MLmodel file, the web server worked as expected.

Answered By - Matin Zivdar

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, July 26, 2022

[FIXED] MLflow webserver returns 400 status, "Incompatible input types for column X. Can not safely convert float64 to <U0."

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels