Saturday, November 18, 2023

[FIXED] How to use TopK operator in ONNX?

November 18, 2023 onnx, onnxruntime, python-3.x, scikit-learn, skl2onnx No comments

Issue

I am writing a converter and calculator to convert my custom sklearn transformers into ONNX models. I need to calculate the median of my data points. Interesting point - ONNX has no function to calculate the median (at least I didn't find anything here.

So, I am using the TopK operator to calculate the median. Now, since it returns two outputs, it's a bit tricky to use it:

I tried to use it like this:

def mt_transformer_converter(scope, operator, container):
    op = operator.raw_operator
    opv = container.target_opset
    out = operator.outputs

    X = operator.inputs[0]
    n = operator.inputs[0].get_second_dimension()
    dtype = guess_numpy_type(X.type)

    # This is the line of focus
    Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=out[:1])
    
    Y.add_to(scope, container)

It threw this error:

ValueError: Unexpected index 1 in operator name 'TopK' with .output names ['variable']

It was apparent this would be the case since TopK returns not 1 but two outputs - The top K values and their corresponding indices. So, the next obvious option was to change the output names as follows:

Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=['values','indices'])

It threw this error:

RuntimeError: After 2 iterations for 2 nodes, still unable to sort names {'variable'}. The graph may be 
disconnected. List of operators: 
Cast(variable) -> [Y]
--
--all-nodes--
--
TopK|To_TopK(X#0, To_TopKcst#0) -> [values, indices]
Cast|Cast(variable) -> [Y]

The code above disconnects the graph, because if you check the operator.outputs, you will see [Variable('variable', 'variable', type=FloatTensorType(shape=[None, 3]))]. Thus, it expects the output with the name variable, which appears out of nowhere in the graph, hence the error.

Now, there are two things:

I just need the top K "values" and not the "indices".
I can make the code run somehow (although it doesn't serve my purpose) using a different operator say ReduceSum.

Y =  OnnxReduceSum(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])

This makes the code run fine. On closer inspection, one would notice that we have used indexing to complete objective [1] and since now TopK is not the last operator in the graph, we have used the output_names as out[:1] where out = operator.outputs to complete objective [2].

But still, we haven't got TopK. If only we could replace ReduceSum with some different operator so that we get the results of TopK. Thankfully, ONNX has Identity!!!

So, we can finally modify the line to:

Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])

This gives us the desired result. Now, the question is - Is there a cleaner more straightforward way to do this?

PS - The complete MWE (Minimal workable example) is as follows:

import numpy as np
import pandas as pd

from onnxruntime import InferenceSession

from sklearn.base import BaseEstimator, TransformerMixin
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType, guess_numpy_type
from skl2onnx.algebra.onnx_ops import (
    OnnxReduceSum,
    OnnxTopK,
    OnnxIdentity
)
from skl2onnx import update_registered_converter

def mt_transformer_shape_calculator(operator):
    op = operator.raw_operator
    input_type = operator.inputs[0].type.__class__
    input_dim = operator.inputs[0].get_first_dimension()
    n = operator.inputs[0].get_second_dimension()
    
    output_type = input_type([input_dim, 3])
    operator.outputs[0].type = output_type
    
def mt_transformer_converter(scope, operator, container):
    op = operator.raw_operator
    opv = container.target_opset
    out = operator.outputs

    X = operator.inputs[0]
    n = operator.inputs[0].get_second_dimension()
    dtype = guess_numpy_type(X.type)
    Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
    Y.add_to(scope, container)
    

class MedianTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        pass



data = pd.DataFrame(
    [[1,2,3,4],[4,5,6,5]]
)
    
update_registered_converter(
    MedianTransformer, "MTTransformer",
    mt_transformer_shape_calculator,
    mt_transformer_converter)

mt = MedianTransformer()
onx = convert_sklearn(mt, name='test', initial_types=[("X", FloatTensorType([None,4]))], 
                      final_types=[("Y", DoubleTensorType([None,3]))])


sess = InferenceSession(onx.SerializeToString())

sess.run(None, {'X': data.values.astype(np.float32)})[0]

Output:

array([[4., 3., 2.],
       [6., 5., 5.]])

Solution

I'm the main author of the API you used. I agree this expression Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1]) is cumbersome but that's what I would recommend. The runtime usually removes the unnecessary operator Identity. That's why it is used in many places to rename a result. There is no real plan to improve this API. The next step would probably to use this https://github.com/microsoft/onnxscript which is a lot easier to use to write tests and loops in onnx.

Now for the median. Unless you know the shape of X, you should get it with operator Shape. TopK does more than it is required but it should work.

def mt_transformer_converter(scope, operator, container):
    op = operator.raw_operator
    opv = container.target_opset
    out = operator.outputs

    X = operator.inputs[0]
    shape = OnnxShape(X, op_version=opv)
    # you should add dtype, the default value is not always int64 depending on this OS
    first_dim = OnnxGather(shape, np.array([0], dtype=np.int64)
    k = OnnxDiv(first_dim, np.array([2], dtype=np.int64)
    Y = OnnxIdentity(OnnxTopK(X, k,op_version=opv)[0], op_version=opv, output_names=out[:1])
    Y.add_to(scope, container)

Onnx is constantly updating or adding new operators. You may propose yours: https://github.com/onnx/sigs/tree/main/operators.

Answered By - user12471066

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, November 18, 2023

[FIXED] How to use TopK operator in ONNX?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels