Issue
I am writing a converter
and calculator
to convert my custom sklearn transformers into ONNX models. I need to calculate the median of my data points. Interesting point - ONNX has no function to calculate the median (at least I didn't find anything here.
So, I am using the TopK operator to calculate the median. Now, since it returns two outputs, it's a bit tricky to use it:
I tried to use it like this:
def mt_transformer_converter(scope, operator, container):
op = operator.raw_operator
opv = container.target_opset
out = operator.outputs
X = operator.inputs[0]
n = operator.inputs[0].get_second_dimension()
dtype = guess_numpy_type(X.type)
# This is the line of focus
Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=out[:1])
Y.add_to(scope, container)
It threw this error:
ValueError: Unexpected index 1 in operator name 'TopK' with .output names ['variable']
It was apparent this would be the case since TopK returns not 1 but two outputs - The top K values and their corresponding indices. So, the next obvious option was to change the output names as follows:
Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=['values','indices'])
It threw this error:
RuntimeError: After 2 iterations for 2 nodes, still unable to sort names {'variable'}. The graph may be
disconnected. List of operators:
Cast(variable) -> [Y]
--
--all-nodes--
--
TopK|To_TopK(X#0, To_TopKcst#0) -> [values, indices]
Cast|Cast(variable) -> [Y]
The code above disconnects the graph, because if you check the operator.outputs
, you will see
[Variable('variable', 'variable', type=FloatTensorType(shape=[None, 3]))]
. Thus, it expects the output with the name variable
, which appears out of nowhere in the graph, hence the error.
Now, there are two things:
- I just need the top K "values" and not the "indices".
- I can make the code run somehow (although it doesn't serve my purpose) using a different operator say
ReduceSum
.
Y = OnnxReduceSum(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
This makes the code run fine. On closer inspection, one would notice that we have used indexing to complete objective [1] and since now TopK
is not the last operator in the graph, we have used the output_names
as out[:1]
where out = operator.outputs
to complete objective [2].
But still, we haven't got TopK
. If only we could replace ReduceSum
with some different operator so that we get the results of TopK
. Thankfully, ONNX has Identity!!!
So, we can finally modify the line to:
Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
This gives us the desired result. Now, the question is - Is there a cleaner more straightforward way to do this?
PS - The complete MWE (Minimal workable example) is as follows:
import numpy as np
import pandas as pd
from onnxruntime import InferenceSession
from sklearn.base import BaseEstimator, TransformerMixin
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType, guess_numpy_type
from skl2onnx.algebra.onnx_ops import (
OnnxReduceSum,
OnnxTopK,
OnnxIdentity
)
from skl2onnx import update_registered_converter
def mt_transformer_shape_calculator(operator):
op = operator.raw_operator
input_type = operator.inputs[0].type.__class__
input_dim = operator.inputs[0].get_first_dimension()
n = operator.inputs[0].get_second_dimension()
output_type = input_type([input_dim, 3])
operator.outputs[0].type = output_type
def mt_transformer_converter(scope, operator, container):
op = operator.raw_operator
opv = container.target_opset
out = operator.outputs
X = operator.inputs[0]
n = operator.inputs[0].get_second_dimension()
dtype = guess_numpy_type(X.type)
Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
Y.add_to(scope, container)
class MedianTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
pass
data = pd.DataFrame(
[[1,2,3,4],[4,5,6,5]]
)
update_registered_converter(
MedianTransformer, "MTTransformer",
mt_transformer_shape_calculator,
mt_transformer_converter)
mt = MedianTransformer()
onx = convert_sklearn(mt, name='test', initial_types=[("X", FloatTensorType([None,4]))],
final_types=[("Y", DoubleTensorType([None,3]))])
sess = InferenceSession(onx.SerializeToString())
sess.run(None, {'X': data.values.astype(np.float32)})[0]
Output:
array([[4., 3., 2.],
[6., 5., 5.]])
Solution
I'm the main author of the API you used. I agree this expression Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
is cumbersome but that's what I would recommend. The runtime usually removes the unnecessary operator Identity. That's why it is used in many places to rename a result. There is no real plan to improve this API. The next step would probably to use this https://github.com/microsoft/onnxscript which is a lot easier to use to write tests and loops in onnx.
Now for the median. Unless you know the shape of X, you should get it with operator Shape. TopK does more than it is required but it should work.
def mt_transformer_converter(scope, operator, container):
op = operator.raw_operator
opv = container.target_opset
out = operator.outputs
X = operator.inputs[0]
shape = OnnxShape(X, op_version=opv)
# you should add dtype, the default value is not always int64 depending on this OS
first_dim = OnnxGather(shape, np.array([0], dtype=np.int64)
k = OnnxDiv(first_dim, np.array([2], dtype=np.int64)
Y = OnnxIdentity(OnnxTopK(X, k,op_version=opv)[0], op_version=opv, output_names=out[:1])
Y.add_to(scope, container)
Onnx is constantly updating or adding new operators. You may propose yours: https://github.com/onnx/sigs/tree/main/operators.
Answered By - user12471066
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.