pfnet-research / chainer-compiler Goto Github PK

View Code? Open in Web Editor NEW

112.0 20.0 23.0 32.98 MB

Experimental toolchain to compile and run Chainer models

License: MIT License

CMake 1.65% Python 61.13% C++ 35.63% C 0.71% Shell 0.43% Ruby 0.05% Dockerfile 0.39% PureBasic 0.01% Emacs Lisp 0.01%

chainer-compiler's People

Stargazers

Watchers

chainer-compiler's Issues

Reconsider the API to access Graph

Currently, Graph has state. Some methods like params and flops are valid only after compile is called (note params may change by Conv+BN merge). Probably, we should fix this issue by introducing another class structure in chainer_compiler.py to encapsulate the raw APIs.

Add custom ONNX domain for chainer

https://github.com/onnx/onnx/blob/5972eed96752551a0ea8b932904fedd568c48d3b/docs/IR.md#models

Models MUST specify a domain and use reverse domain names based on the responsible organization's identity, the same convention that is traditionally used for naming Java packages.

Define somewhere and replace the domain in:

chainer-compiler/compiler/custom_onnx_ops.cc

Line 3 in 4ea7172

 #define ONNX_CHAINER_OPERATOR_SET_SCHEMA(name, ver, impl) ONNX_OPERATOR_SET_SCHEMA_EX(name, Onnx, ONNX_DOMAIN, ver, false, impl) 

Deprecate compiler/tensor.h

It could be replaced by chainerx::Array. At first, I thought it is a good idea not to let compiler depend on runtime/chainerx. However, there seems to be no actual downside and we are already using chainerx::Array for constant propagation

Support functions and links

Functions and layers should be supported as CH2O

CH2O Testcase

https://github.com/pfnet-research/chainer-compiler/tree/master/ch2o/tests/node

elichia functions

https://github.com/pfnet-research/chainer-compiler/blob/master/elichika/elichika/parser/functions_builtin.py

https://github.com/pfnet-research/chainer-compiler/blob/master/elichika/elichika/functions_buildin.py

How to add a function

it is an example.

#166

elichika links

https://github.com/pfnet-research/chainer-compiler/blob/master/elichika/elichika/parser/links_builtin.py

https://github.com/pfnet-research/chainer-compiler/blob/master/elichika/elichika/links_builtin.py

Rest tests

Name	Type	Status	CH2O
EmbedID.py	Links		https://github.com/pfnet-research/chainer-compiler/blob/master/ch2o/ch2o/links.py#L455
NStepBiLSTM.py	Links		https://github.com/pfnet-research/chainer-compiler/blob/master/ch2o/ch2o/links.py#L312
NStepLSTM.py	Links		https://github.com/pfnet-research/chainer-compiler/blob/master/ch2o/ch2o/links.py#L177
Roi.py		not pass
Sum.py	Functions		https://github.com/pfnet-research/chainer-compiler/blob/master/ch2o/ch2o/funcs.py#L483

pytest python fails with chainer==6.0.0b2

For some reason pybind11 starts refusing to translate Python chainerx.ndarray to C++ std::shared_ptr<chainerx::internal::ArrayBody>

Debug build fails in ONNX

$ cmake .. -DCHAINERX_BUILD_PYTHON=OFF -DCHAINERX_BUILD_CUDA=OFF -G Ninja -DCMAKE_BUILD_TYPE=Debug
$ ninja
$ build-dbg/tools/run_onnx --test third_party/onnx/onnx/backend/test/data/node/test_abs
Initializing ChainerX...
Loading model...
Loading data...
Found 1 test cases
Constructing model...
terminate called after throwing an instance of 'chainer_compiler_onnx::assert_error'
  what():  ../third_party/onnx/onnx/defs/schema.cc:824: SchemasRegisterer: Assertion `dbg_registered_schema_count == DbgOperatorSetTracker::Instance().GetCount()` failed: 236 schema were exposed from operator sets and automatically placed into the static registry.  248 were expected based on calls to registration macros. Operator set functions may need to be updated.

Probably, the right solution is to create a custom ONNX op registry instead of adding custom ops to opset 9, but I'm going to disable assertion for now.

Reorganize directory structure

This issue tries to collect issues related to the current directory structure.

Redesign dtype_inference.cc

Current dtype_inference.cc handles coercion like Mul(x:int32, y:float32) => z:float32. However, the backprop of this op will be Mul(gz:float32, x:int32) => gy:float32 (OK) and Mul(gz:float32, y:float32) => gx:int32 (fail). This issue is exposed in #247 . I think we should probably insert Cast operations automatically instead of allowing coercion.

Move string type to `Type::Kind`

Rel: #403

Resize needs some fixes

#499 is incomplete.

https://github.com/pfnet-research/chainer-compiler/pull/499/files#diff-7ebd2d3af353e0fc6ab9b75bca7ade97R43
Improve the performance of upsample
Measure the improvement of the above
Get linear upsample pass
Get downsample tests pass

Run tests generated by ONNX-chainer

It's nice to run these tests by scritpts/runtests.py

Also, it's better to fix failing tests:

Dilation
HardSigmoid
PRelu
ReduceProd
MaxRoiPool
Tile
MaxPool: not sure, but probably padding related

Quantized ops

Some quantized ops are being added. Summary:

(A) add quantize ops: onnx/onnx#1872
(B) add quantizer: onnx/onnx#1892
(C) onnxruntime
(D) ngraph

QuantizeLinear: A B C D
QLinearConv: B C D
QLinearMatMul: B C
ConvInteger: B C
MatMulInteger: B C
DequantizeLinear: A B C D

ONNX header generate order issue

When including ONNX headers, protoc generated headers aren't generated sometimes(like first time to run build with -j4) because of the wrong dependencies.

Cannot build due to AttributeError: module 'chainer.functions' has no attribute 'roi_max_pooling_2d'

Hello, I was trying to build chainer-compiler but met this issue:

[  0%] Generating stamp_out/ch2o_node_SoftmaxClossEntropy
Traceback (most recent call last):
  File "/home/ruizhe/projects/chainer-compiler/ch2o/tests/node/SoftmaxClossEntropy.py", line 18, in <module>
    import ch2o
  File "/home/ruizhe/projects/chainer-compiler/ch2o/ch2o/__init__.py", line 1, in <module>
    from ch2o.chainer2onnx import compile_model
  File "/home/ruizhe/projects/chainer-compiler/ch2o/ch2o/chainer2onnx.py", line 25, in <module>
    from ch2o.funcs import Func, Func2NodeClass, Function_Concat, Function_Dummy, castto
  File "/home/ruizhe/projects/chainer-compiler/ch2o/ch2o/funcs.py", line 585, in <module>
    (F.roi_max_pooling_2d, Function_ROIMaxPool2d),
AttributeError: module 'chainer.functions' has no attribute 'roi_max_pooling_2d'
CMakeFiles/node_SoftmaxClossEntropy.dir/build.make:71: recipe for target 'stamp_out/ch2o_node_SoftmaxClossEntropy' failed
make[2]: *** [stamp_out/ch2o_node_SoftmaxClossEntropy] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/node_SoftmaxClossEntropy.dir/all' failed
make[1]: *** [CMakeFiles/node_SoftmaxClossEntropy.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2

I am not sure why this happens and I will investigate later on.

Other info:

commit id: 3a95f55259d1e1ad96fb12974e47714e08ed9a41
Not using CUDA

Elichika builtins for static if/for

Examples:

if elichika.ignore_this_branch(hparam['debug_mode'] > 1):
  if value > 2: raise RuntimeError('foo')

# This is from EspNet's decoder
for l in elichika.unroll(six.moves.range(1, self.dlayers)):
     c_new, z_new = self['lstm%d' % l](c_list[l], z_list[l], z_list_new[-1])
     c_list_new.append(c_new)
     z_list_new.append(z_new)

They should be just an identity function (e.g., def unroll(x): return x) when the code is evaluated by Python, but should be handled specially by elichika.

Add PythonFunctionNode and PythonLink op

python/oniku.py allows you to use C++ code from Python. There should be the way to get out from C++ to Python, too.

Use standard ROI ONNX operators

Operators needs to be changed

MaxRoiPool
RoiAlign
- Mode avg and max is defined
Seems AverageRoiPool isn't defined yet

Things needed for the change:

Define new ONNX style operator in ChxVM
Remove ChainerROI* operators
Migrate ch2o and elichika ONNX generate

Estimate FLOPS

It'd help understanding how fast/slow it is.

This looks great: https://github.com/belltailjp/chainer_computational_cost

GemmOp which directly uses clblas

Fill back shape information from XCVM run

Discard unused inputs/outputs of Loop/If in middle-end

Without subgraphs, unnecessary computation nodes will be discarded thanks to the topological sort. However, as current middle-end assumes all inputs are outputs of subgraphs are necessary without checking their graph bodies, there can be unnecessary computation may run. This might be hurting the performance of backward computation of EspNet, but I'm not very sure.

Compile MaskRCNNFPNResNet50 with elichika

chainer/onnx-chainer#128 is a long standing issue of onnx-chainer. Although this issue can be workarounded by as_funcnode or fake_as_funcnode in onnx-chainer, these APIs are difficult to use. It'd be great if we can export onnx from this complex model with some reasonable amount of ignore_branch and for_unroll.

Add continue/break/return to elichika

A GSoC student is working for it.

Refactoring keyargs parser

refactoring these codes

https://github.com/pfnet-research/chainer-compiler/blob/master/elichika/elichika/parser/functions.py#L204

Dead code removal in canonicalizer.

Implement something like Vulture's NodVisitor to detect and purge non-reachable dead code, unused imports and functions. It would prevent generation of unused DAGs in ONNX.

The current canonicalizer translates this code snippet below

def func():
    do_something1()
    return
    do_something2()  # Should be purged from AST as it will never be hit.

to following.

def func():
    returned_value = None
    returned1 = False
    do_something1()
    returned_value = None
    returned1 = True
    if not returned1:
        do_something2()  # This dead code is being processed in Elichika.
    return returned_value

If we implement a dead code removal canonicalizer, the above can be avoided.

Remove unused input and output from erichika if, for, listcomp

Erichika's if, for, listcomp nodes contains unused inputs and outputs.
It should be removed.

pass syntax tests in elichika

If.py
Lazy* is wrong.
For.py
Many things
ForAndIf.py
LinkInFor.py
ChinerFunctionNode.py
Many things

Build protobuf by ourselves

Like Chainer and ONNX, it'd be better to have protobuf as a submodule of this project so that users won't meet ABI-related issues. Against chainer_compiler_core.so and other Python modules, we probably need to link protobuf statically.

Mandate clang-format

Gradient output does not match with that of chainer in vgg19 model

It seems that gradient output is slightly different from that of chainer model in vgg19 model.

Generation code of .onnx file using onnx_chainer (latest master branch is used here):

import chainer
import chainer.links as L
import chainer.functions as F
import numpy as np

import onnx_chainer

class Wrapper(chainer.Chain):
    def __init__(self, predictor, key=None):
        super().__init__()

        with self.init_scope():
            self.predictor = predictor
        self.key = key

    def __call__(self, x, t):
        if self.key is None:
            y = self.predictor(x)
        else:
            y = self.predictor(x, layers=[self.key])[self.key]
        y = F.softmax_cross_entropy(y, t)
        return y


model = L.VGG19Layers(pretrained_model=None)
model = Wrapper(model, 'fc8')
x = np.random.uniform(size=(4, 3, 224, 224)).astype('f')
x = chainer.as_variable(x)
t = np.random.randint(size=(4,), low=0, high=1000).astype(np.int32)
t = chainer.as_variable(t)

onnx_chainer.export_testcase(model, (x, t), 'vgg19', output_grad=True)

Execution command on chainer-compiler:
./build/tools/run_onnx --test ../onnx-chainer/vgg19 --backprop

Obtained result:

Initializing ChainerX...
Loading model...
Loading data...
Found 1 test cases
Constructing model...
WARNING: int32 Take is not supported by ChainerX, could be slow
Generate code...
Running for ../onnx-chainer/vgg19/test_data_set_0
Verifying the result...
OK: ReduceMean_0
OK: grad_out@/predictor/conv1_1/W
OK: grad_out@/predictor/conv1_1/b
OK: grad_out@/predictor/conv1_2/W
OK: grad_out@/predictor/conv1_2/b
OK: grad_out@/predictor/conv2_1/W
OK: grad_out@/predictor/conv2_1/b
OK: grad_out@/predictor/conv2_2/W
OK: grad_out@/predictor/conv2_2/b
OK: grad_out@/predictor/conv3_1/W
FAIL(value): grad_out@/predictor/conv3_1/b
Expected: (256,) [0,20]=array([0.        , -0.00000018, 0.00000047, -0.00000005, -0.00000013, -0.00000067, -0.00000018, 0.00000037, 0.        , 0.0000001 ,
       -0.00000008, 0.00000008, -0.00000016, 0.00000017, 0.00000002, 0.00000027, 0.        , -0.00000006, 0.        , -0.00000029], shape=(20,), dtype=float32, device='
native:0')
Actual: (256,) [0,20]=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], shape=(20,), dtype=float32, device='native:0')
OK: grad_out@/predictor/conv3_2/W
FAIL(value): grad_out@/predictor/conv3_2/b
Expected: (256,) [0,20]=array([0.00000053, -0.00000273, -0.00000009, -0.00000041, 0.00000276, -0.00000068, -0.00000006, -0.00000001, -0.0000009 , 0.00000137,
       0.0000002 , -0.00000072, -0.0000003 , 0.0000013 , -0.00000276, 0.00000116, 0.00000016, 0.00000026, 0.00000205, 0.00000003], shape=(20,), dtype=float32, device='n
ative:0')
Actual: (256,) [0,20]=array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], shape=(20,), dtype=float32, device='native:0')
OK: grad_out@/predictor/conv3_3/W
...
OK: grad_out@/predictor/fc8/W
OK: grad_out@/predictor/fc8/b
Elapsed: 115062 msec
Check `ok_cnt' == `test_case->outputs.size()' failed! in RunMain at /home/mkusumoto/chainer-compiler/tools/run_onnx.cc:629: (25 vs 39)
[1]    15009 abort (core dumped)  ./build/tools/run_onnx --test ../onnx-chainer/vgg19 --backprop

This may indicate that just numerical error, the effect of dropout, or some computation is incorrect.

Improve float16

Add np.float16 to

        for dtype in (np.float32, np.float64):

in scripts/gen_backprop_tests_oc.py and fix tests

Release docker image

Build with CUDA on Travis

I don't even know if it's possible, though

Support `None` type

Currently they are treated as False in ch2o:

chainer-compiler/ch2o/ch2o/utils.py

Line 112 in 2298109

# TODO(hamaji): Revisit to check if this decision is OK.

How to check:

diff --git a/ch2o/tests/syntax/Cmp.py b/ch2o/tests/syntax/Cmp.py
index fa8913b..e1ab3fc 100644
--- a/ch2o/tests/syntax/Cmp.py
+++ b/ch2o/tests/syntax/Cmp.py
@@ -67,6 +67,7 @@ if __name__ == '__main__':
         for x, y in [(None, None),
                      (42, None),
                      (True, None),
+                     (False, None),
                      (True, False),
                      (True, True),
                      (False, False),
@@ -75,4 +76,3 @@ if __name__ == '__main__':
                      (np.array(42), np.array(42))]:
             ch2o.generate_testcase(cls(), [x, y],
                                    subname='%s_%s_%s' % (x, name, y))
-

$ python3 scripts/ch2ocheck.py ch2o/tests/syntax/Cmp.py

Thoughts to implement:

Rename current Null kind of ChxVMVar to Undefined
Mark a special opaque kind with name None in ONNX representation

Create pip package

Add support for `F.shift`

Relay

My impression was they don't fully support dynamic shape, but I could be wrong. If we really use Relay, there'd be two options:

ONNX+ => Relay: we may end up needing to extend ONNX+ until it becomes a superset of Relay
Chainer => Relay: more straightforward, but we'll loose collaboration with other ONNX ecosystem (nGraph, ONNX runtime, etc.)

Do not retain tensors when only shapes are needed

For example,

void ReshapeGradFn(GradientOpContext* gc) {
    GraphBuilder gb{gc->builder(0)};
    Value* t0 = gb.Op(Node::kShape, {gc->x(0)});
    gc->GradOp(Node::kReshape, 0, {gc->gy(0), t0});
}

should be something like

void ReshapeGradFn(GradientOpContext* gc) {
    GraphBuilder gb{gc->builder(0)};
    Value* t0 = gc->RetainXShape(0);
    gc->GradOp(Node::kReshape, 0, {gc->gy(0), t0});
}

Use more abseil components

Now that we have third_party/abseil and I think we should start using it especially from compiler directory (where we don't need to care about the binary size)

debugging: stacktrace and symbolize
strings: str_cat, etc.
types/variant: should be used from other union like classes such as Type
types/optional: I think we should NOT use it for now to be consistent with ChainerX

Better way to detach node

When removing multiple nodes from the graph we need to update GetLiveNodes or check whether node is detached.
It's OK for now since point that needs checking or update is only here.
Though it may need to do either in other places in the future and it's not a good thing to do every time.

Update live nodes list after every node mutation
Keep the inputs/outputs of node and remove detached in correct timing(May need some modification in ONNX generator)

Unify how flags/configs are organized

It's shame compile/runtime flags are scattered

and it's hard to keep them consistent.

Maybe we want a code generator which generates code which handles them.

Add scalar/shape types to XCVMVar

As of Jan 2019, all tensor values are encoded by chainerx::Array, which is inefficient. It'd be better to introduce kScalar and kShape to handle these values without chainerx::Array. Also, we probably want to

introduce std::variant
change XCVMState::variables_ from unique_ptr to non-unique_ptr

Replace NDArray Shape

replace better one

Contributing to chainer-compiler

Hi @shinh
I have built chainer-compiler from source and have been going through the code base. I would like to contribute but needed some directions to start.

For adding Chainer Functions support ( like Issue #67 ), should the workflow first include adding the support for the function in onnx-chainer ( like PR chainer/onnx-chainer#44) using ops provided by ONNX or can they be implemented as done in PR #62 ?

Moreover, I am also interested in working with the Python AST for ch2o and elichika. Could you please suggest some basic feature improvements that I could try working on to get an idea about it?

Enable pytest on Travis

Test tools/train_imagenet and examples/imagenet

Notably, examples/imagenet is currently broken, I believe.

Automatic device assignment

Currently, when -d cuda is specified, some heuristics are currently used:

all inputs will be allocated in GPU except ones which are directly fed to Reshape op as its output shape.
results of Shape op will be stored in host.
if the right hand side of Div is a single float value, chainerx::AsScalar is called internally in XCVM's Div (this weird hack is for y / x.shape[0] where x.shape[0] is a batch_size).

These actually work as mitigation for now, but we should design more sophisticated device assignments. The principle would be

Shapes should be in CPU
Make it possible to run cross-device binary ops when one of their inputs is a scalar.
If requirements cannot be satisfied, insert a custom op (say OnikuxDeviceCopy) which explicitly copy data between devices, probably showing a warning to users.

Export and run all models in ChainerCV

100% coverage for https://github.com/chainer/chainercv/tree/master/chainercv/links/model could be a great goal for both elichika and compiler+runtime.

$ cd chainercv/links/model && grep -ho 'F\.\w\+' **/*.py | sort | uniq

Looking at the result of the above command, the following ops look important.

F.huber_loss
F.normalize
F.roi_pooling_2d
F.unpooling_2d
F.upsampling_2d
L.DilatedConvolution2D

Some of the above have standard ONNX ops. It would be also great if we add support of them in ONNX-chainer.

Optimize `ReplaceConv`

Currently, ReplaceConv just splits the convolution, which is not efficient for a large number of groups (e.g., depthwise conv). This significantly slow down recent lightweight models. Replace the implementation with Chainer's algorithm: https://github.com/chainer/chainer/blob/master/chainer/functions/connection/convolution_2d.py#L207

Fix performance regression in EspNet E2E with csj recipie

relates to: #289