Just wondering, but is there a way to use conv and <c

Knet7 had a cpu conv implementation written by Onur Kuru in <a href="https://githu

Using conv and pool without CUDA about knet.jl HOT 20 CLOSED

denizyuret commented on August 26, 2024

Using conv and pool without CUDA

from knet.jl.

Comments (20)

denizyuret commented on August 26, 2024 1

Nobody is actively working on this right now, we are looking for volunteers...

…

On Tue, Jan 9, 2018 at 2:46 AM Michael Green ***@***.***> wrote: Hey, jumping into the thread here. Any current plans for addressing the speed on CPU when using Knet? I really like Knet as it's really native to julia and nice to work with. However, I'm stuck with CPU for a while and would like to get MXNet.jl performance if possible. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#33 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABvNpnT4ujB4dy5m3m6oxpIW5yMwKadPks5tIqi8gaJpZM4Kna_F> .

from knet.jl.

denizyuret commented on August 26, 2024

Knet7 had a cpu conv implementation written by Onur Kuru in
https://github.com/denizyuret/Knet.jl/blob/master/deprecated/src7/util/conv_pool_cpu.jl

This has not been ported / tested on Knet8 yet, it is on the todo list.

On Wed, Nov 2, 2016 at 7:03 PM niczky12 [email protected] wrote:

Just wondering, but is there a way to use conv and pool` without a GPU?
I'm running a windows machine and even though I have a nvidia card
installed, I failed to install CUDA. If any of you have tips on how to get
this working that would be appreciated.

Thanks!

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#33, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABvNpvXzqA1PuYAwBxoVjvwyTmELYdQ4ks5q6LRNgaJpZM4Kna_F
.

from knet.jl.

denizyuret commented on August 26, 2024

Some experimental code in the cpuconv branch. Not all padding/stride options supported. Slow and not fully tested.

from knet.jl.

denizyuret commented on August 26, 2024

Onur's latest cpu conv code: https://github.com/kuruonur1/CNN.jl

from knet.jl.

denizyuret commented on August 26, 2024

This is incorporated in the latest master. Can try to make it more efficient. We should also find open source kernels to try, from ArrayFire, Nervana etc. to replace cuDNN and to inform more efficient CPU implementations. I am keeping this issue open for ongoing work.

from knet.jl.

denizyuret commented on August 26, 2024

Mocha.jl has CPU implementations, should check out the speed.

from knet.jl.

denizyuret commented on August 26, 2024

Working on integrating Mocha CPU conv/pool under mochaconv branch.

from knet.jl.

denizyuret commented on August 26, 2024

Mocha cpu conv/pool kernels have been integrated. They utilize multiple cores using openmp. I don't think the cpu conv/pool speed is going to get much better, they are about 10x slower than gpu. It may be possible to have a single im2col operation instead of one for each image.

I am leaving this issue open for now to see if (1) we can find better cpu kernels, (2) we can find better open source gpu kernels to replace cudnn.

from knet.jl.

jgbos commented on August 26, 2024

For CPU, you can look at what we did for n-dimensional convolutions (we used conv2 when we should've of used convnd) in Seep.jl here . We are currently looking into using CudaNative.jl and llvm for julia-0.6 to produce efficient gpu kernels.

from knet.jl.

denizyuret commented on August 26, 2024

That's great news! I would love to try some open source gpu kernels when you guys have something ready to test. I haven't looked at CudaNative yet, but if I can help with benchmarking etc. let me know.

For CPU Onur's implementation also used conv2 but it was too slow. In the latest release I adapted the C++ kernels from Mocha.jl which use openmpi and work pretty fast. See Knet.jl/prof/conv.jl for some benchmarking results, we should compare with the Seep.jl implementation.

from knet.jl.

jgbos commented on August 26, 2024

Thanks for the CPU references. I had meant we extended the name conv2 when in fact it is an N-dimensional implementation. We avoided doing a im2col operation because it uses too much memory when building the graph. We haven't done much on benchmarking, and we are also very limited in our ability release code updates.

You should also look at ImageFiltering.jl. Tim Holy has made a lot of optimizations for doing efficient convolutions on images with im_filter. No gradients though.

from knet.jl.

denizyuret commented on August 26, 2024

The latest benchmarks from @ilkarman (https://github.com/ilkarman/DeepLearningFrameworks) show our cpu implementation to be quite inefficient. There is a new thread in (https://discourse.julialang.org/t/on-machine-learning-and-programming-languages/7574/30) suggesting alternatives. We need volunteers to reimplement cpu convolution operations using Intel MKL.

Dynet-benchmarks by @ilkerkesen also show a similar trend for our cpu implementation of the cudnn rnn kernels. Knet compares very well to Chainer and Dynet on the GPU but the cpu performance is lacking. A similar volunteer effort is needed there.

from knet.jl.

denizyuret commented on August 26, 2024

Also see fb.me/83w6aHEJO
With Onur's summary from 3/28/16:
Convolution icin Fourier veya Winograd Transform kullanmislar. Birkac network configurasyonu icin Im2col + gemm'e (bizim kullandigimiza) gore 2'den 4 kata kadar hizli calistigini gosteriyorlar.
Repo surada: https://github.com/Maratyszcza/NNPACK
C'de yazilmis ve derlenip Julia'dan cagiriabilir. Fakat su anda iki limitasyonu var:

Only convolutional layers without stride are currently supported (stride=1?)
Only 2x2 pooling is currently supported

from knet.jl.

DoktorMike commented on August 26, 2024

Hey, jumping into the thread here. Any current plans for addressing the speed on CPU when using Knet? I really like Knet as it's really native to julia and nice to work with. However, I'm stuck with CPU for a while and would like to get MXNet.jl performance if possible.

from knet.jl.

davidbp commented on August 26, 2024

Maybe the code from https://github.com/CNugteren/CLBlast could be helpful, as an alternative to BLAS, CLBLAS. This code supports FP16 compute. For convolutions using matrix multiplies https://arxiv.org/abs/1704.04428.

from knet.jl.

denizyuret commented on August 26, 2024

https://github.com/intel/mkl-dnn may be a good solution?

from knet.jl.

denizyuret commented on August 26, 2024

https://discourse.julialang.org/t/knet-vs-flux-etc/17057/10?u=denizyuret shows that Flux is faster in CPU convolutions. Mike Innes says: "(Flux uses) NNlib’s pure-Julia convolutions vs Knet’s threaded C++ ones, although NNlib is soon to move to NNPACK".

from knet.jl.

cemilcengiz commented on August 26, 2024

There is a julia wrapper for NNPACK intended to be used in NNlib.jl for Flux.
https://github.com/avik-pal/NNPACK.jl

The problem with NNPACK is that for small batchsizes it is slower compared to NNlib.jl's native Julia conv.
FluxML/NNlib.jl#67 (comment)

Similarly, NNPACK is also slower compared to Pytorch's conv in small batchsizes.
pytorch/pytorch#2826 (comment)

Apparently they don't utilize NNPACK for now. But if they do, it seems they will resort to a heuristic based approach to switch between default conv and NNPACK implementation depending on the input parameters such as batchsize and number of channels.

There other problems with NNPACK.
-It does not support 3D conv and pooling:
Maratyszcza/NNPACK#138 (comment)
-It does not support strided convolution for training
Maratyszcza/NNPACK#139 (comment)

from knet.jl.

denizyuret commented on August 26, 2024

@cemilcengiz, we are trying to pass CI tests on windows, ARM etc with @ianshmean and the CPU conv kernels are causing trouble. (1) Is NNlib's pure Julia implementation comparable in speed to our CPU kernels? (2) Does NNPACK require any compiling or library installations? (3) Any progress/improvements in any of the solutions mentioned above (mkl-dnn, Seep.jl, ImageFiltering.jl), CLBlast.

My current concern is for ease of installation rather than speed. So if it is not too much slower, I'd like to go with a pure Julia solution.

from knet.jl.

denizyuret commented on August 26, 2024

#494 switches to NNlib for CPU conv/pool.

from knet.jl.

Using conv and pool without CUDA about knet.jl HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent