wkcn / mobulaop Goto Github PK
View Code? Open in Web Editor NEWA Simple & Flexible Cross Framework Operators Toolkit
License: MIT License
A Simple & Flexible Cross Framework Operators Toolkit
License: MIT License
This is great work. Is this tested with multi GPUs and can be executed in parallel?
gcc5 couldn't compile MobulaOP because incomplete type error.
The current package name is mobula
in this project.
However, the name is duplicated with the project mobula.
I will rename the package of MobulaOP.
Hi, I have tried to use the ROIAlign custom op provided by the repo, and I run faster rcnn examples, I just simplily replace the symbol code:
roi_pool = mx.symbol.ROIPooling(name='roi_pool', data=conv_new_1_relu, rois=rois,
pooled_size=(7, 7), spatial_scale=spatial_scale)
with
roi_pool = mobula.op.ROIAlign(name='roi_pool', data=conv_new_1_relu, rois=rois,
pooled_size=(7, 7), spatial_scale=spatial_scale, sampling_ratio=0)
the running speed decrease from 0.1s to 1~2s, and when use multi-gpu, the code cannot run parallel, and become much more slower.
My mxnet version is 1.3.0-cu92 from pip install. What might be the problem?
firstly, thanks for the work! make it easy to use mxnet.
my question as following:
op created with Mobula may be called by gluon?
Hi @wkcn , really nice work.
I'd like to ask is it possible for Mobula to leverage existing math helpers in deep learning frameworks like ATen in Pytorch and Mshadow in MXNet?
Writing everything from vanilla C++ is prohibitively cumbersome and thus prevent the adoption of Mobula in real practice.
【python test_mul_func.py】,error occurs
[10:21:12] src/engine/engine.cc:55: MXNet start using engine: NaiveEngine
/wls/tf_workspace/MobulaOP/mobula/glue/mx.py:44: UserWarning: Using asynchronous execution for MXNet failed, since /home/weishuyi/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so: undefined symbol: MXShallowCopyNDArray
It will drop the performance.
Recommend using the latest version of MXNet
Recommend using the latest version of MXNet""".format(e))
mkdir -p /wls/tf_workspace/MobulaOP/mobula/build/cpu/src
g++ /wls/tf_workspace/MobulaOP/mobula/src/defines.cpp -std=c++11 -DUSING_CUDA=0 -DUSING_HIP=0 -DUSING_OPENMP=0 -DHOST_NUM_THREADS=40 -O3 -DUSING_CBLAS=0 -I/wls/tf_workspace/MobulaOP/mobula/./ -I/wls/tf_workspace/MobulaOP/mobula/./inc -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/dlpack/include -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/tvm_packed_func -fPIC -Werror -Wall -Wextra -pedantic -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 -Winit-self -Wmissing-include-dirs -Wold-style-cast -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-promo -Wundef -fdiagnostics-show-option -c -o /wls/tf_workspace/MobulaOP/mobula/build/cpu/src/defines.o
g++ /wls/tf_workspace/MobulaOP/mobula/src/context.cpp -std=c++11 -DUSING_CUDA=0 -DUSING_HIP=0 -DUSING_OPENMP=0 -DHOST_NUM_THREADS=40 -O3 -DUSING_CBLAS=0 -I/wls/tf_workspace/MobulaOP/mobula/./ -I/wls/tf_workspace/MobulaOP/mobula/./inc -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/dlpack/include -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/tvm_packed_func -fPIC -Werror -Wall -Wextra -pedantic -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 -Winit-self -Wmissing-include-dirs -Wold-style-cast -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-promo -Wundef -fdiagnostics-show-option -c -o /wls/tf_workspace/MobulaOP/mobula/build/cpu/src/context.o
mkdir -p MulElemWise/build/MulElemWise/build/cpu
g++ MulElemWise/build/cpu/MulElemWise_wrapper.cpp -std=c++11 -DUSING_CUDA=0 -DUSING_HIP=0 -DUSING_OPENMP=0 -DHOST_NUM_THREADS=40 -O3 -DUSING_CBLAS=0 -I/wls/tf_workspace/MobulaOP/mobula/./ -I/wls/tf_workspace/MobulaOP/mobula/./inc -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/dlpack/include -I/wls/tf_workspace/MobulaOP/mobula/../3rdparty/tvm_packed_func -fPIC -Werror -Wall -Wextra -pedantic -Wcast-align -Wcast-qual -Wctor-dtor-privacy -Wdisabled-optimization -Wformat=2 -Winit-self -Wmissing-include-dirs -Wold-style-cast -Woverloaded-virtual -Wredundant-decls -Wshadow -Wsign-promo -Wundef -fdiagnostics-show-option -c -o MulElemWise/build/MulElemWise/build/cpu/MulElemWise_wrapper.o
In file included from /wls/tf_workspace/MobulaOP/mobula/./inc/mobula_op.h:5:0,
from MulElemWise/build/cpu/MulElemWise_wrapper.cpp:8:
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h: In function ‘void RegisterMXAPI(void*, void*, void*, void*, void*)’:
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:45:76: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXShallowCopyNDArray)>(shallow_copy_ndarray);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:46:73: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
MXNDArrayFree = reinterpret_cast<decltype(MXNDArrayFree)>(ndarray_free);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:48:74: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXNDArrayGetContext)>(ndarray_get_context);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:50:70: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXNDArrayToDLPack)>(ndarray_to_dlpack);
^
/wls/tf_workspace/MobulaOP/mobula/./inc/glue_mx.h:52:73: error: ISO C++ forbids casting between pointer-to-function and pointer-to-object [-Werror=pedantic]
reinterpret_cast<decltype(MXEnginePushSyncND)>(engine_push_sync_nd);
May I ask what implementation of creating operator in our MobulaOP:
HI, I have tried your basic example on mulelementwise, but got this kind of error.
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/alexhu/Source/MobulaOP/mobula/glue/common.py", line 158, in __call__ return backend.op_gen(glue_mod, op=self.op, name=self.name)(*args, **new_kwargs) File "/home/alexhu/Source/MobulaOP/mobula/glue/th.py", line 41, in __call__ return self.cache[self.name](*pars[0], **pars[1])(*inputs) File "/home/alexhu/anaconda3/envs/slr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__ result = self.forward(*input, **kwargs) File "/home/alexhu/Source/MobulaOP/mobula/glue/th.py", line 105, in forward return torch_func.apply(self, *args, **kwargs) File "/home/alexhu/Source/MobulaOP/mobula/glue/th.py", line 59, in forward out = self._forward(*args, **kwargs) File "/home/alexhu/Source/MobulaOP/docs/tutorial/MulElemWise/MulElemWise.py", line 7, in forward mobula.func.mul_elemwise(a.size, a, b, self.y) File "/home/alexhu/Source/MobulaOP/mobula/func.py", line 148, in __call__ data, var_dev_id, ctype = self._get_scalar_info(var, ptype) File "/home/alexhu/Source/MobulaOP/mobula/func.py", line 277, in _get_scalar_info var, ctypes.c_void_p) else ptype.ctype(var) TypeError: an integer is required (got type builtin_function_or_method)
There are too many codes which don't have comments. I need to add them.
Todo List:
It will be great if the package could provide a gradient checking tool.
Hello, it's very useful! I have a problem, I define a custom operator in mxnet(python) and get a model. Now I want to load the model(.json & .params ) by mxnet(C++). Can you give me some advice? thanks.
With MXNet 1.5 nightly build (1.5.0b20181222), import mobula
gives segmentation fault.
When calling MobulaOP in a subprocess, it gets stuck.
Environment: lastest mxnet nightly build and Python 3.6.5
An example code modified from dynamic_import_op.py
to replicate this error.
from concurrent import futures
import sys
import mxnet as mx
def foo():
import mobula
# Import Custom Operator Dynamically
mobula.op.load('./AdditionOP')
AdditionOP = mobula.op.AdditionOP
a = mx.nd.array([1, 2, 3])
b = mx.nd.array([4, 5, 6])
a.attach_grad()
b.attach_grad()
with mx.autograd.record():
c = AdditionOP(a, b)
dc = mx.nd.array([7, 8, 9])
c.backward(dc)
assert ((a + b).asnumpy() == c.asnumpy()).all()
assert (a.grad.asnumpy() == dc.asnumpy()).all()
assert (b.grad.asnumpy() == dc.asnumpy()).all()
print('Okay :-)')
print('a + b = c \n {} + {} = {}'.format(a.asnumpy(), b.asnumpy(), c.asnumpy()))
def main():
ex = futures.ProcessPoolExecutor(1)
r = ex.submit(foo)
r.result()
if __name__ == "__main__":
main()
Hi there, this issue is to summarize some custom operators to be supported.
Please feel free to add it if you want any operator : )
The following code produces wrong output.
If I change .cuda()
to .cpu()
, I can get correct output.
(Fix #10 is required to run this example)
# Use ROIAlign operator
import sys
sys.path.append('../') # Add MobulaOP path
import numpy as np
import mobula
# Load ROIAlign Module
mobula.op.load('ROIAlign')
dtype = np.float32
N, C, H, W = 2, 3, 4, 4
import torch
data = torch.tensor(np.arange(N*C*H*W).astype(dtype).reshape((N,C,H,W))).cuda()
rois = torch.tensor(np.array([[0, 1, 1, 3, 3]], dtype = dtype)).cuda()
output = mobula.op.ROIAlign(data = data, rois = rois, pooled_size = (2,2), spatial_scale = 1.0, sampling_ratio = 1)
print("= OUTPUT =")
print (output)
When I test the tutorials, I get the warning:
/root/test/MobulaOP/mobula/glue/mx.py:44: UserWarning: Using asynchronous execution for MXNet failed, since /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so: undefined symbol: MXShallowCopyNDArray
It will drop the performance.
But I have try different version of mxnet: 1.5.0, 1.5.1, 1.6.0b20190729
same warning.
MXNet pip package is built with gcc4. I wonder when Mobula ops are compiled (without MXNet source code) with gcc5, would it still work?
Hi, will you release the PyTorch version? And will it support multi-gpu training?
Hi, there.
I found MobulaOP will crash when training model on multiple GPUs.
I'm trying to fix it.
This is really nice work! It would be great if there is a tutorial of extending this package. Something like PyTorch Extension package http://hangzh.com/PyTorch-Encoding/notes/extending.html
The function dot_add
and linalg_gemm_??
are not thread safe.
In this project, I use the header file functional-gcc4_9.h with GPL License to address the problem of ABI compability. I need to resolve the License problem.
I have removed the files under GPL in master branch.
In addition, there is a branch under GPL: https://github.com/wkcn/MobulaOP/tree/master-GPL
, which keeps the compatibility of gcc.
Is there a way to specify the data type of the outputs (other than always using float32)?
And, in general, does MobulaOP support mixed types when implementing a kernel?
Thanks!
I wrote my first demo of Mobula op. The directory of my project:
mobula_test
│ main.py
└──TestOP
└───TestOP.cpp
The content of files:
main.py:
import mobula
import mxnet as mx
from mxnet import nd
from tqdm import tqdm
if __name__ == '__main__':
mobula.op.load('TestOP')
ctx = mx.cpu()
a = nd.ones((5000, 5000), ctx=ctx)
b = nd.ones((5000, 5000), ctx=ctx)
out = nd.empty(a.shape, ctx=ctx)
print("cpu")
for i in tqdm(range(1000)):
mobula.func.TestOP(a.size, a, b, out)
ctx = mx.gpu()
a = nd.ones((5000, 5000), ctx=ctx)
b = nd.ones((5000, 5000), ctx=ctx)
out = nd.empty(a.shape, ctx=ctx)
print("gpu")
for i in tqdm(range(1000)):
mobula.func.TestOP(a.size, a, b, out)
TestOP.cpp:
template<typename DType>
MOBULA_KERNEL TestOP_kernel(const int n, const DType* a, const DType* b, DType* out)
{
parfor(n, [&](int i)
{
out[i] = a[i] + b[i];
});
}
time cost: cpu 14s, gpu 226s on i7-7700k & 1080ti. The usage of cpu and gpu is both 100%
os environment: win10 1809, cuda 10.0
Is possible support cupy (NumPy-like API accelerated with CUDA)?
Traceback (most recent call last):
File "D:\Miniconda3\envs\python35\lib\site-packages\mobula\func.py", line 208, in call
var, ptype, template_mapping, using_async)
File "D:\Miniconda3\envs\python35\lib\site-packages\mobula\func.py", line 273, in _get_tensor_info
raise TypeError()
TypeError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_mul_func.py", line 9, in
mobula.func.mul_elemwise(aa.size, aa, bb, outa)
File "D:\Miniconda3\envs\python35\lib\site-packages\mobula\func.py", line 239, in call
self.name, self.func.arg_types, list(map(type, args))))
TypeError: Unmatched parameters list of the function mul_elemwise
:
[const int32_t, <typename const T*>, <typename const T*>, <typename T*>]
vs
[<class 'int'>, <class 'cupy.core.core.ndarray'>, <class 'cupy.core.core.ndarray'>, <class 'cupy.core.core.ndarray'>]
hi
i am learning from your project and want to where to find these keywords of mxnet?
like those in check_backend(b):
func_names = ['get_pointer', 'dev_id', 'wait_to_read', 'wait_to_write', 'OpGen']
is there any pages online about meanings of these keywords?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.