Issue
Goal: Use this Notebook to perform quantisation on albert-base-v2 model.
Kernel: conda_pytorch_p36
.
Outputs in Sections 1.2 & 2.2 show that:
- converting vanilla BERT from PyTorch to ONNX stays the same size,
417.6 MB
. - Quantization models are smaller than vanilla BERT, PyTorch
173.0 MB
and ONNX104.8 MB
.
However, when running ALBert:
- PyTorch and ONNX model sizes are different.
- Quantized model sizes are bigger than vanilla.
I think this is the reason for poorer model performance of both Quantization methods of ALBert, compared to vanilla ALBert.
PyTorch:
Size (MB): 44.58906650543213
Size (MB): 22.373255729675293
ONNX:
ONNX full precision model size (MB): 341.64233207702637
ONNX quantized model size (MB): 85.53886985778809
Why might exporting ALBert from PyTorch to ONNX increase model size, but not for BERT?
Please let me know if there's anything else I can add to post.
Solution
Explanation
ALBert model has shared weights among layers. torch.onnx.export
outputs the weights to different tensors, which causes the model size to grow larger.
A number of Git Issues have been marked Solved regarding this phenomena.
The most common solution is to remove shared weights, that is to remove tensor arrays that contain the exact same values.
Solutions
Section "Removing shared weights" in onnx_remove_shared_weights.ipynb.
from onnxruntime.transformers.onnx_model import OnnxModel
model=onnx.load(path)
onnx_model=OnnxModel(model)
count = len(model.graph.initializer)
same = [-1] * count
for i in range(count - 1):
if same[i] >= 0:
continue
for j in range(i+1, count):
if has_same_value(model.graph.initializer[i], model.graph.initializer[j]):
same[j] = i
for i in range(count):
if same[i] >= 0:
onnx_model.replace_input_of_all_nodes(model.graph.initializer[i].name, model.graph.initializer[same[i]].name)
onnx_model.update_graph()
onnx_model.save_model_to_file(output_path)
Answered By - StressedBoi_69420
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.