oneflow-inc / oneflow_convert Goto Github PK
View Code? Open in Web Editor NEWOneFlow->ONNX
OneFlow->ONNX
在调用 convert_to_onnx_and_check
时
convert_to_onnx_and_check(t5_graph,
external_data=False,
opset=None,
flow_weight_dir=None,
onnx_model_path="./",
dynamic_batch_size=False)
报错:
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from ./model.onnx failed:This is an invalid model. In Node, ("model.t5_model.encoder.layers.0.self_attention-arange-18", Range, "", -1) : ("start85": tensor(int64),"limit80": tensor(int64),"delta81": tensor(int64),) -> ("model.t5_model.encoder.layers.0.self_attention-arange-18/out_0",) , Error No Op registered for Range with domain_version of 10
在网上查找了以后, NVIDIA/TensorRT#1658 , 发现可能onnx的版本原因.
修改调用接口为:
convert_to_onnx_and_check(t5_graph,
external_data=False,
opset=11,
flow_weight_dir=None,
onnx_model_path="./",
dynamic_batch_size=False)
出现新的报错:
Traceback (most recent call last):
File "libai/onnx_export/t5_to_onnx.py", line 57, in <module>
convert_to_onnx_and_check(t5_graph,
File "/home/chengpeng/miniconda3/envs/libai/lib/python3.8/site-packages/oneflow_onnx-0.5.5-py3.8.egg/oneflow_onnx/oneflow2onnx/util.py", line 99, in convert_to_onnx_and_check
File "/home/chengpeng/miniconda3/envs/libai/lib/python3.8/site-packages/oneflow_onnx-0.5.5-py3.8.egg/oneflow_onnx/oneflow2onnx/util.py", line 29, in run_onnx
File "/home/chengpeng/miniconda3/envs/libai/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/chengpeng/miniconda3/envs/libai/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 384, in _create_inference_session
sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : Load model from ./model.onnx failed:This is an invalid model. Type Error: Type 'tensor(int64)' of input parameter (model.t5_model.encoder.layers.0.self_attention-scalar_add-25/out_0) of operator (Sum) in node (model.t5_model.encoder.layers.0.self_attention-add_n-39) is invalid.
计划分2个pr分别解决。
应该是ONNX自己的bug。复现代码如下:
"""
Copyright 2020 The OneFlow Authors. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""
import tempfile
import oneflow as flow
from oneflow_onnx.oneflow2onnx.util import convert_to_onnx_and_check
class Conv2d(flow.nn.Module):
def __init__(self) -> None:
super(Conv2d, self).__init__()
self.conv = flow.nn.Conv2d(3, 16, 3)
def forward(self, x: flow.Tensor) -> flow.Tensor:
return self.conv(x)
conv_module = Conv2d()
class Conv2dOpGraph(flow.nn.Graph):
def __init__(self):
super().__init__()
self.m = conv_module
def build(self, x):
out = self.m(x)
return out
def test_conv2d():
conv_graph = Conv2dOpGraph()
conv_graph._compile(flow.randn(1, 3, 224, 224))
with tempfile.TemporaryDirectory() as tmpdirname:
flow.save(conv_module.state_dict(), tmpdirname)
convert_to_onnx_and_check(conv_graph, flow_weight_dir=tmpdirname, onnx_model_path="/tmp", opset=14)
test_conv2d()
报错信息:
ONNX Failed to infer shapes and dtypes for [m.conv-bias_add-131, type: Unsqueeze]
Traceback (most recent call last):
File "/home/zhangxiaoyu/oneflow_convert/oneflow_onnx/schemas.py", line 184, in InferOnnxShapeDtype
inferred_model = shape_inference.infer_shapes(model_proto)
File "/home/zhangxiaoyu/miniconda3/envs/clang10/lib/python3.8/site-packages/onnx/shape_inference.py", line 41, in infer_shapes
inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode, data_prop)
RuntimeError: Input 1 is out of bounds.
昨天迟哥在使用oneflow->onnx的时候好像出现过一次转换模型失败(结果的精度误差>1e-4),今天确认onnx文件写入的方式是wb
,所以会覆盖原有onnx模型,不存在模型文件追加导致失败的原因。所以可能是某个op有转换精度问题,需要在本地多次运行尝试复现解决。
oneflow_onnx.oneflow2onnx.util import export_onnx_model模块导入报错?
from oneflow_onnx.oneflow2onnx.util import export_onnx_model
export_onnx_model(graph,
external_data=False,
opset=None,
flow_weight_dir=None,
onnx_model_path="/tmp",
dynamic_batch_size=False)
python 3.8
oneflow 0.8.1.dev20221201+cu112
oneflow-onnx 0.6.1
onnx 1.12.0
onnx-simplifier 0.4.10
onnxoptimizer 0.3.2
onnxruntime-gpu 1.13.1
如何支持logical_slice_assign op?
例子:
import tempfile
import oneflow as flow
from oneflow_onnx.oneflow2onnx.util import convert_to_onnx_and_check
class logicalSliceAssign(flow.nn.Module):
def __init__(self) -> None:
super(logicalSliceAssign, self).__init__()
def forward(self, x: flow.Tensor) -> flow.Tensor:
x[:, 0 : 2] += x
return x
logical_slice = logicalSliceAssign()
class logicalSliceOpGraph(flow.nn.Graph):
def __init__(self):
super().__init__()
self.m = logical_slice
def build(self, x):
out = self.m(x)
return out
def test_logical_slice():
logical_slice_graph = logicalSliceOpGraph()
logical_slice_graph._compile(flow.randn(1, 2, 1, 1))
print(logical_slice_graph._ops_repr)
with tempfile.TemporaryDirectory() as tmpdirname:
flow.save(slice.state_dict(), tmpdirname)
convert_to_onnx_and_check(logical_slice_graph, flow_weight_dir=tmpdirname, onnx_model_path="/tmp")
test_logical_slice()
来源 flowvision rexnet
Unsupported ops: Counter({'silu': 57, 'narrow': 9, 'max_pool_2d': 3, 'scalar_pow': 3, 'upsample_nearest_2d': 2})
Unsupported ops: Counter({'broadcast_matmul': 89, 'scalar_div': 4, 'gather': 4, 'elementwise_minimum': 3, 'fill_': 2, 'scalar_logical_less': 2, 'where': 2, 'scalar_logical_greater': 1})
复现代码
import tempfile
import oneflow as flow
from oneflow_onnx.oneflow2onnx.util import convert_to_onnx_and_check
from flowvision.models.face_recognition import iresnet50
model = iresnet50().to("cuda")
class ModelGraph(flow.nn.Graph):
def __init__(self, model):
super().__init__()
self.config.allow_fuse_add_to_output(True)
self.backbone = model
def build(self, x):
x = x.to("cuda")
out = self.backbone(x)
return out
model.eval()
model_graph = ModelGraph(model)
model_graph._compile(flow.randn(1, 3, 112, 112))
with tempfile.TemporaryDirectory() as tmpdirname:
flow.save(model.state_dict(), tmpdirname)
convert_to_onnx_and_check(
model_graph, flow_weight_dir=tmpdirname, onnx_model_path="./", print_outlier=True)
报错
File ~/miniconda/lib/python3.9/site-packages/oneflow_onnx/oneflow2onnx/util.py:102, in compare_result(a, b, rtol, atol, print_outlier)
100 if np.abs(a[i] - b[i]) > atol + rtol * np.abs(b[i]):
101 print("a[{}]={}, b[{}]={}".format(i, a[i], i, b[i]))
--> 102 assert np.allclose(a, b, rtol=rtol, atol=atol)
AssertionError:
但是 self.config.allow_fuse_add_to_output(True)
这一行注释掉就可以成功转换。
最开始删掉所有包含有@flow
的装饰器,还有flow.experimental,flow.checkpoint等
将模型迁移到Graph上
# 假设class AlexNet存在
def alexnet(pretrained: bool = False, progress: bool = True, **kwargs: Any) -> AlexNet:
model = AlexNet(**kwargs)
return model
def test_alexnet():
alexnet_module = alexnet()
alexnet_module.eval().to("cuda")
# Graph添加到oneflow_onnx/oneflow2onnx/util.py文件中
alexnet_graph = Graph(alexnet_module)
# 这里把job换成了graph
convert_to_onnx_and_check(
alexnet_graph,
flow_weight_dir="./examples/oneflow2onnx/models/alexnet_oneflow_model",
onnx_model_path="/tmp"
)
其中Graph为
class Graph(flow.nn.Graph):
def __init__(self, module):
super().__init__()
self.m = module
def build(self, x):
out = self.m(x)
return out
在这里也有叫Graph的类,我打算修改成OneflowGraph,跟tvm里一致
关于convert_to_onnx_and_check()函数
关于export_onnx_model()函数,转换主函数
关于ProcessFlowGraph函数
shapes = {}
dtypes = {}
graph_str = repr(graph)
# print(graph_str)
size_where = 2
if "cuda" in graph_str:
size_where = 3
p_size = re.compile(r"size=\(.*?\)", re.S)
p_type = re.compile(r"dtype=.*?,", re.S)
types = ["INPUT", "PARAMETER", "BUFFER", "OUTPUT"]
for t in types:
data = re.finditer(t+":.*", graph_str)
for i in data:
attrs = i.group().split(":")
size_str = re.findall(p_size, attrs[size_where])
type_str = re.findall(p_type, attrs[size_where])
assert size_str != [] or type_str != [], \
"size should not be None, please check your inputs dtype"
size_attr = size_str[0].replace("size=", "")
type_attr = type_str[0].replace("dtype=", "")
if size_attr[-2] == ",":
size_attr = size_attr.replace(",", "")
if type_attr[-1] == ",":
type_str = type_attr.replace(",", "")
data_size = tuple(map(int, size_attr[1:-1].split(", ")))
node_name = attrs[1]
shapes[node_name] = data_size
dtypes[node_name] = STR_TO_FLOW[type_str]
关于nodes的存放,现有的oneflow2onnx是list模式,用helper.make_node()暂时不清楚会有什么问题
同时其中的get_inputs和get_outputs这些函数可以使用node.user_conf.node_name进行改动,也就是对应这里
当然我觉得这个可能可以并入这里,在tvm中是这么做的
graph._is_compiled()
def parse_attr(attr):
# Parse node_attr
attrs = {}
for a in attr:
attr_str = str(attr[a])
if attr_str[0:7] == "at_list":
attr_str_ = attr_str.split(" ")[0]
if attr_str_ == "at_list_float":
attrs[a] = tuple(attr[a].at_list_float.val)
elif attr_str_ == "at_list_int32":
attrs[a] = tuple(attr[a].at_list_int32.val)
elif attr_str_ == "at_list_int64":
attrs[a] = tuple(attr[a].at_list_int64.val)
elif attr_str.split(":")[0] == "at_string":
attrs[a] = attr[a].at_string
elif attr_str.split(" ")[0] == "at_shape":
attrs[a] = tuple(list(attr[a].at_shape.dim))
else:
attr_str_ = attr_str.split(":")[0]
if attr_str_ == "at_bool":
attrs[a] = attr[a].at_bool
elif attr_str_ == "at_double":
attrs[a] = attr[a].at_double
elif attr_str_ == "at_float":
attrs[a] = attr[a].at_float
elif attr_str_ == "at_int32":
attrs[a] = attr[a].at_int32
elif attr_str_ == "at_int64":
attrs[a] = attr[a].at_int64
return attrs
关于optimizer我没有详细看,应该也有一些类似上面的错误,比如命名的修改以及op中attrs信息的提取
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.