Git Product home page Git Product logo

Comments (20)

denizyuret avatar denizyuret commented on August 26, 2024 1

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

Knet7 had a cpu conv implementation written by Onur Kuru in
https://github.com/denizyuret/Knet.jl/blob/master/deprecated/src7/util/conv_pool_cpu.jl

This has not been ported / tested on Knet8 yet, it is on the todo list.

On Wed, Nov 2, 2016 at 7:03 PM niczky12 [email protected] wrote:

Just wondering, but is there a way to use conv and pool` without a GPU?
I'm running a windows machine and even though I have a nvidia card
installed, I failed to install CUDA. If any of you have tips on how to get
this working that would be appreciated.

Thanks!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#33, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABvNpvXzqA1PuYAwBxoVjvwyTmELYdQ4ks5q6LRNgaJpZM4Kna_F
.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

Some experimental code in the cpuconv branch. Not all padding/stride options supported. Slow and not fully tested.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

Onur's latest cpu conv code: https://github.com/kuruonur1/CNN.jl

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

This is incorporated in the latest master. Can try to make it more efficient. We should also find open source kernels to try, from ArrayFire, Nervana etc. to replace cuDNN and to inform more efficient CPU implementations. I am keeping this issue open for ongoing work.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

Mocha.jl has CPU implementations, should check out the speed.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

Working on integrating Mocha CPU conv/pool under mochaconv branch.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

Mocha cpu conv/pool kernels have been integrated. They utilize multiple cores using openmp. I don't think the cpu conv/pool speed is going to get much better, they are about 10x slower than gpu. It may be possible to have a single im2col operation instead of one for each image.

I am leaving this issue open for now to see if (1) we can find better cpu kernels, (2) we can find better open source gpu kernels to replace cudnn.

from knet.jl.

jgbos avatar jgbos commented on August 26, 2024

For CPU, you can look at what we did for n-dimensional convolutions (we used conv2 when we should've of used convnd) in Seep.jl here . We are currently looking into using CudaNative.jl and llvm for julia-0.6 to produce efficient gpu kernels.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

That's great news! I would love to try some open source gpu kernels when you guys have something ready to test. I haven't looked at CudaNative yet, but if I can help with benchmarking etc. let me know.

For CPU Onur's implementation also used conv2 but it was too slow. In the latest release I adapted the C++ kernels from Mocha.jl which use openmpi and work pretty fast. See Knet.jl/prof/conv.jl for some benchmarking results, we should compare with the Seep.jl implementation.

from knet.jl.

jgbos avatar jgbos commented on August 26, 2024

Thanks for the CPU references. I had meant we extended the name conv2 when in fact it is an N-dimensional implementation. We avoided doing a im2col operation because it uses too much memory when building the graph. We haven't done much on benchmarking, and we are also very limited in our ability release code updates.

You should also look at ImageFiltering.jl. Tim Holy has made a lot of optimizations for doing efficient convolutions on images with im_filter. No gradients though.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

The latest benchmarks from @ilkarman (https://github.com/ilkarman/DeepLearningFrameworks) show our cpu implementation to be quite inefficient. There is a new thread in (https://discourse.julialang.org/t/on-machine-learning-and-programming-languages/7574/30) suggesting alternatives. We need volunteers to reimplement cpu convolution operations using Intel MKL.

Dynet-benchmarks by @ilkerkesen also show a similar trend for our cpu implementation of the cudnn rnn kernels. Knet compares very well to Chainer and Dynet on the GPU but the cpu performance is lacking. A similar volunteer effort is needed there.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

Also see fb.me/83w6aHEJO
With Onur's summary from 3/28/16:
Convolution icin Fourier veya Winograd Transform kullanmislar. Birkac network configurasyonu icin Im2col + gemm'e (bizim kullandigimiza) gore 2'den 4 kata kadar hizli calistigini gosteriyorlar.
Repo surada: https://github.com/Maratyszcza/NNPACK
C'de yazilmis ve derlenip Julia'dan cagiriabilir. Fakat su anda iki limitasyonu var:

  • Only convolutional layers without stride are currently supported (stride=1?)
  • Only 2x2 pooling is currently supported

from knet.jl.

DoktorMike avatar DoktorMike commented on August 26, 2024

Hey, jumping into the thread here. Any current plans for addressing the speed on CPU when using Knet? I really like Knet as it's really native to julia and nice to work with. However, I'm stuck with CPU for a while and would like to get MXNet.jl performance if possible.

from knet.jl.

davidbp avatar davidbp commented on August 26, 2024

Maybe the code from https://github.com/CNugteren/CLBlast could be helpful, as an alternative to BLAS, CLBLAS. This code supports FP16 compute. For convolutions using matrix multiplies https://arxiv.org/abs/1704.04428.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

https://github.com/intel/mkl-dnn may be a good solution?

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

https://discourse.julialang.org/t/knet-vs-flux-etc/17057/10?u=denizyuret shows that Flux is faster in CPU convolutions. Mike Innes says: "(Flux uses) NNlib’s pure-Julia convolutions vs Knet’s threaded C++ ones, although NNlib is soon to move to NNPACK".

from knet.jl.

cemilcengiz avatar cemilcengiz commented on August 26, 2024

There is a julia wrapper for NNPACK intended to be used in NNlib.jl for Flux.
https://github.com/avik-pal/NNPACK.jl

The problem with NNPACK is that for small batchsizes it is slower compared to NNlib.jl's native Julia conv.
FluxML/NNlib.jl#67 (comment)

Similarly, NNPACK is also slower compared to Pytorch's conv in small batchsizes.
pytorch/pytorch#2826 (comment)

Apparently they don't utilize NNPACK for now. But if they do, it seems they will resort to a heuristic based approach to switch between default conv and NNPACK implementation depending on the input parameters such as batchsize and number of channels.

There other problems with NNPACK.
-It does not support 3D conv and pooling:
Maratyszcza/NNPACK#138 (comment)
-It does not support strided convolution for training
Maratyszcza/NNPACK#139 (comment)

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

@cemilcengiz, we are trying to pass CI tests on windows, ARM etc with @ianshmean and the CPU conv kernels are causing trouble. (1) Is NNlib's pure Julia implementation comparable in speed to our CPU kernels? (2) Does NNPACK require any compiling or library installations? (3) Any progress/improvements in any of the solutions mentioned above (mkl-dnn, Seep.jl, ImageFiltering.jl), CLBlast.

My current concern is for ease of installation rather than speed. So if it is not too much slower, I'd like to go with a pure Julia solution.

from knet.jl.

denizyuret avatar denizyuret commented on August 26, 2024

#494 switches to NNlib for CPU conv/pool.

from knet.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.