Git Product home page Git Product logo

onednn's Introduction

oneAPI Deep Neural Network Library (oneDNN)

oneAPI logo

oneAPI Deep Neural Network Library (oneDNN) is an open-source cross-platform performance library of basic building blocks for deep learning applications. oneDNN is part of oneAPI. The library is optimized for Intel(R) Architecture Processors, Intel Graphics, and Arm* 64-bit Architecture (AArch64)-based processors. oneDNN has experimental support for the following architectures: NVIDIA* GPU, AMD* GPU, OpenPOWER* Power ISA (PPC64), IBMz* (s390x), and RISC-V.

oneDNN is intended for deep learning applications and framework developers interested in improving application performance on CPUs and GPUs. Deep learning practitioners should use one of the applications enabled with oneDNN.

Table of Contents

Documentation

  • Developer Guide explains the programming model, supported functionality, and implementation details, and includes annotated examples.
  • API Reference provides a comprehensive reference of the library API.

Installation

Binary distribution of this software is available in:

The packages do not include library dependencies and these need to be resolved in the application at build time. See the System Requirements section below and the Build Options section in the Developer Guide for more details on CPU and GPU runtimes.

If the configuration you need is not available, you can build the library from source.

System Requirements

oneDNN supports platforms based on the following architectures:

WARNING

Power ISA (PPC64), IBMz (s390x), and RISC-V (RV64) support is experimental with limited testing validation.

The library is optimized for the following CPUs:

  • Intel Atom(R) processor (at least Intel SSE4.1 support is required)
  • Intel Core(TM) processor (at least Intel SSE4.1 support is required)
  • Intel Core Ultra processors (formerly Meteor Lake)
  • Intel Xeon(R) processor E3, E5, and E7 family (formerly Sandy Bridge, Ivy Bridge, Haswell, and Broadwell)
  • Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, Ice Lake, Sapphire Rapids, and Emerald Rapids)
  • Intel Xeon CPU Max Series (formerly Sapphire Rapids HBM)
  • future Intel Xeon Scalable processors (code name Sierra Forest and Granite Rapids)

On a CPU based on Intel 64 or on AMD64 architecture, oneDNN detects the instruction set architecture (ISA) at runtime and uses just-in-time (JIT) code generation to deploy the code optimized for the latest supported ISA. Future ISAs may have initial support in the library disabled by default and require the use of run-time controls to enable them. See CPU dispatcher control for more details.

On a CPU based on Arm AArch64 architecture, oneDNN can be built with Arm Compute Library (ACL) integration. ACL is an open-source library for machine learning applications and provides AArch64 optimized implementations of core functions. This functionality currently requires that ACL is downloaded and built separately; see Build from Source. oneDNN only supports Compute Library versions 23.11 or later.

WARNING

On macOS, applications that use oneDNN may need to request special entitlements if they use the hardened runtime. See the Linking Guide for more details.

The library is optimized for the following GPUs:

  • Intel Graphics for 11th-14th Generation Intel Core Processors
  • Intel Graphics for Intel Core Ultra processors (formerly Meteor Lake)
  • Intel Iris Xe MAX Graphics (formerly DG1)
  • Intel Arc(TM) graphics (formerly Alchemist)
  • Intel Data Center GPU Flex Series (formerly Arctic Sound)
  • Intel Data Center GPU Max Series (formerly Ponte Vecchio)

Requirements for Building from Source

oneDNN supports systems meeting the following requirements:

  • Operating system with Intel 64 / Arm 64 / Power / IBMz architecture support
  • C++ compiler with C++11 standard support
  • CMake 2.8.12 or later
  • Arm Compute Library (ACL) for builds using ACL on AArch64.

The following tools are required to build oneDNN documentation:

Configurations of CPU and GPU engines may introduce additional build time dependencies.

CPU Engine

oneDNN CPU engine is used to execute primitives on Intel Architecture Processors, 64-bit Arm Architecture (AArch64) processors, 64-bit Power ISA (PPC64) processors, IBMz (s390x), and compatible devices.

The CPU engine is built by default but can be disabled at build time by setting DNNL_CPU_RUNTIME to NONE. In this case, GPU engine must be enabled. The CPU engine can be configured to use the OpenMP, TBB or SYCL runtime. The following additional requirements apply:

Some implementations rely on OpenMP 4.0 SIMD extensions. For the best performance results on Intel Architecture Processors we recommend using the Intel C++ Compiler.

GPU Engine

Intel Processor Graphics and Xe Architecture graphics are supported by the oneDNN GPU engine. The GPU engine is disabled in the default build configuration. The following additional requirements apply when GPU engine is enabled:

WARNING

Linux will reset GPU when kernel runtime exceeds several seconds. The user can prevent this behavior by disabling hangcheck for Intel GPU driver. Windows has built-in timeout detection and recovery mechanism that results in similar behavior. The user can prevent this behavior by increasing the TdrDelay value.

WARNING

NVIDIA GPU support is experimental. General information, build instructions, and implementation limitations are available in the NVIDIA backend readme.

WARNING

AMD GPU support is experimental. General information, build instructions, and implementation limitations are available in the AMD backend readme.

Runtime Dependencies

When oneDNN is built from source, the library runtime dependencies and specific versions are defined by the build environment.

Linux

Common dependencies:

  • GNU C Library (libc.so)
  • GNU Standard C++ Library v3 (libstdc++.so)
  • Dynamic Linking Library (libdl.so)
  • C Math Library (libm.so)
  • POSIX Threads Library (libpthread.so)

Runtime-specific dependencies:

Runtime configuration Compiler Dependency
DNNL_CPU_RUNTIME=OMP GCC GNU OpenMP runtime (libgomp.so)
DNNL_CPU_RUNTIME=OMP Intel C/C++ Compiler Intel OpenMP runtime (libiomp5.so)
DNNL_CPU_RUNTIME=OMP Clang Intel OpenMP runtime (libiomp5.so)
DNNL_CPU_RUNTIME=TBB any TBB (libtbb.so)
DNNL_CPU_RUNTIME=SYCL Intel oneAPI DPC++ Compiler Intel oneAPI DPC++ Compiler runtime (libsycl.so), TBB (libtbb.so), OpenCL loader (libOpenCL.so)
DNNL_GPU_RUNTIME=OCL any OpenCL loader (libOpenCL.so)
DNNL_GPU_RUNTIME=SYCL Intel oneAPI DPC++ Compiler Intel oneAPI DPC++ Compiler runtime (libsycl.so), OpenCL loader (libOpenCL.so), oneAPI Level Zero loader (libze_loader.so)

Windows

Common dependencies:

  • Microsoft Visual C++ Redistributable (msvcrt.dll)

Runtime-specific dependencies:

Runtime configuration Compiler Dependency
DNNL_CPU_RUNTIME=OMP Microsoft Visual C++ Compiler No additional requirements
DNNL_CPU_RUNTIME=OMP Intel C/C++ Compiler Intel OpenMP runtime (iomp5.dll)
DNNL_CPU_RUNTIME=TBB any TBB (tbb.dll)
DNNL_CPU_RUNTIME=SYCL Intel oneAPI DPC++ Compiler Intel oneAPI DPC++ Compiler runtime (sycl.dll), TBB (tbb.dll), OpenCL loader (OpenCL.dll)
DNNL_GPU_RUNTIME=OCL any OpenCL loader (OpenCL.dll)
DNNL_GPU_RUNTIME=SYCL Intel oneAPI DPC++ Compiler Intel oneAPI DPC++ Compiler runtime (sycl.dll), OpenCL loader (OpenCL.dll), oneAPI Level Zero loader (ze_loader.dll)

macOS

Common dependencies:

  • System C/C++ runtime (libc++.dylib, libSystem.dylib)

Runtime-specific dependencies:

Runtime configuration Compiler Dependency
DNNL_CPU_RUNTIME=OMP Intel C/C++ Compiler Intel OpenMP runtime (libiomp5.dylib)
DNNL_CPU_RUNTIME=TBB any TBB (libtbb.dylib)

Validated Configurations

CPU engine was validated on RedHat* Enterprise Linux 8 with

on Windows Server* 2019 with

on macOS 11 (Big Sur) with

GPU engine was validated on Ubuntu* 22.04 with

on Windows Server 2019 with

Applications Enabled with oneDNN

Support

Submit questions, feature requests, and bug reports on the GitHub issues page.

You can also contact oneDNN developers via UXL Foundation Slack using #onednn channel.

Contributing

We welcome community contributions to oneDNN. If you have an idea on how to improve the library:

For additional details, see contribution guidelines. You can also contact oneDNN developers and maintainers via UXL Foundation Slack using #onednn channel.

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

oneDNN is licensed under Apache License Version 2.0. Refer to the "LICENSE" file for the full license text and copyright notice.

This distribution includes third party software governed by separate license terms.

3-clause BSD license:

2-clause BSD license:

Apache License Version 2.0:

Boost Software License, Version 1.0:

MIT License:

This third party software, even if included with the distribution of the Intel software, may be governed by separate license terms, including without limitation, third party license terms, other Intel software license terms, and open source software license terms. These separate license terms govern your use of the third party programs as set forth in the "THIRD-PARTY-PROGRAMS" file.

Security

Security Policy outlines our guidelines and procedures for ensuring the highest level of Security and trust for our users who consume oneDNN.

Trademark Information

Intel, the Intel logo, Arc, Intel Atom, Intel Core, Iris, OpenVINO, the OpenVINO logo, Pentium, VTune, and Xeon are trademarks of Intel Corporation or its subsidiaries.

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

(C) Intel Corporation

onednn's People

Contributors

aaraujom avatar akharito avatar ankalinin avatar atkassen avatar densamoilov avatar dyoussif avatar dzarukin avatar echeresh avatar h-sadia avatar igorsafo avatar irinasok avatar kamil-andrzejewski avatar kealan-barbieri avatar kwiersch avatar menooker avatar mgouicem avatar msotoflo avatar nivas-x86 avatar nshustrov avatar petercad avatar piotrchmiel avatar qyi1 avatar rjoursler avatar shelleygoel avatar simonsays095 avatar skazakov1 avatar taolv avatar tczeszun avatar vpirogov avatar xuxinzen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

onednn's Issues

mkl support for ARM architecture?

In attempting to build:

[ 17%] Building CXX object src/CMakeFiles/mkldnn.dir/cpu/cpu_engine.cpp.o
In file included from /home/bduff/dev/mkl-dnn/src/cpu/jit_generator.hpp:28:0,
                 from /home/bduff/dev/mkl-dnn/src/cpu/jit_avx512_mic_conv_kernel_f32.hpp:21,
                 from /home/bduff/dev/mkl-dnn/src/cpu/jit_avx512_mic_convolution.hpp:23,
                 from /home/bduff/dev/mkl-dnn/src/cpu/cpu_engine.cpp:26:
/home/bduff/dev/mkl-dnn/src/cpu/xbyak/xbyak_util.h:84:21: fatal error: cpuid.h: No such file or directory
   #include <cpuid.h>
                     ^
compilation terminated.

test failed

sorry for distrub.
I follow the install instruction, however I failed at make test.

    Start 1: api-c
1/3 Test #1: api-c ............................***Exception: Other  0.01 sec
    Start 2: test_c_symbols-c
2/3 Test #2: test_c_symbols-c .................   Passed    0.01 sec
    Start 3: tests_gtest
3/3 Test #3: tests_gtest ......................***Failed   32.28 sec

33% tests passed, 2 tests failed out of 3

Total Test time (real) =  32.31 sec

The following tests FAILED:
      1 - api-c (OTHER_FAULT)
      3 - tests_gtest (Failed)
Errors while running CTest
make: *** [test] Error 8

my cpu is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

any help?

any format works for all layers now?

As I know, for conv layer, no matter forward and backward, I can get the best format if use any, right?

How about other layers, like pool?

As pooling layer:

Suppose that we have a simple net: CONV1->POOL1->CONV2.

For forward: if the output format of CONV1 is nchw, then the POOL1 will directly use nchw as input foramt, since pool layer seems do not handle src format as any(maybe we can add it ??).

Likewise for backward: the CONV2 may have nchw as botdiff format, so POOL1 will directly use it.

We do not want to see these cases right? Users may hope nChw8c/nChw16c can be used as the input format if appropriate.

That's for pool layer, how about other layers, like batch norm, fc and concat?

Thanks very much.

convolution bwd data is very slow in jit_gemm_convolution path

Hi, our workload is unbalance on jit_gemm_convolution_bwd_data path, we have a 2322402240 input need to be run on googlenet, batch size =2 , input channel =3. We find that the first conv layer's backward data goes into jit_gemm_convolution path, and it will fork two thread to do the sgemm and col2img(workamount = ngroupnbatchsize), per thread has to deal with 322402240 input. How ever, our cpu is Xeon phi knl, it has 68 cores, other 66 cores are idel during this time.So it is very efficient. How can we solve it? or is it possible to make it run through jit_avx512/jit_avx2 path? format has been set any now, Thanks

pure zero output of softmax

I got "Inf" cost in my training phase.
Finally found that there have many pure 0 as the output of softmax.
This has no impact on scoring, but when in training phase this could be an issue with std::log(0).

Is this normal? Or should I add EPS myself manually?
I thought this could be done in MKL-DNN, enhancing the robustness.

Thanks~

make errors

Hi,
I am new to the Xeon Phi co-processor. I was trying to install the library on a KNL machine, however, I got the following error at the "make" stage:

[ 32%] Building CXX object src/CMakeFiles/mkldnn.dir/cpu/jit_avx2_lrn.cpp.o
[ 34%] Building CXX object src/CMakeFiles/mkldnn.dir/cpu/gemm_inner_product.cpp.o
/mkl-dnn/src/cpu/gemm_inner_product.cpp:39:24: error: variable or field
‘cblas_gemm’ declared void
inline void cblas_gemm(CBLAS_LAYOUT layout,
^
/mkl-dnn/src/cpu/gemm_inner_product.cpp:39:24: error: ‘CBLAS_LAYOUT’
was not declared in this scope
....

The compiler and mkl environment was set already. Could you help me with this error? Thanks!

Best regards,

AVX2 question

So do you need AVX2 instructions to run MKL-DNN? I've ported MKL-DNN to windows and it died at:

0000002D5FD130E7	vfmadd231ps		ymm0,ymm12,ymm15 

Saying illegal instruction, so I checked cpuid flags and YMM is enabled in XCR0, but it seems the AVX2 flag is not set. So that means the fused multiply add won't work, is that correct?

mkl lib for osx?

the script only downloads for linux. is there an osx version?

convolution struct members

Hello,

This is a question to verify my interpretation of test_convolution_descr_t members in mkldnn_test_common.hpp

 mb; // Size of mini-batch
 ng; // What does ng stand for?
 ic, ih, iw; //Input: number of channels, height, and width
 oc, oh, ow; //Output: number of channels, height, and width
 kh, kw; // filter: height and width
 padh, padw; //padding dimensions
 strh, strw; // stride: height and width

Thanks,
Steena

Slow with 1x1 kernel conv

Found a performance issue:

I just run only one test case below, which is the res5a_branch1_conv in ResNet50.
PARAMS(FMT_DATA_BLOCKED, FMT_WEIGHTS_BLOCKED, FMT_NO_BIAS, FMT_DATA_BLOCKED, 64, 1, 1024, 14, 14, 2048, 7, 7, 1, 1, 0, 0, 2, 2)

BTW, need change Line 49 at Convolution_common.h
#define FMT_NO_BIAS format_undef

But it takes about 10 seconds computing backward data on E5-2699 v4, is that normal?
Time between:
stream(stream::kind::eager).submit(pipeline).wait();
I suppose it should be several ms.

Tests not completing successfully

System:
Processor: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
OS: CentOS Linux release 7.2.1511 (Core)

> [davido@knl-data build]$ make test
> Running tests...
> Test project /home/davido/mkl/mkl-dnn-master/build
>       Start  1: simple-net-c
>  1/23 Test  #1: simple-net-c ..........................***Exception: Other  0.11 sec
>       Start  2: simple-net-cpp
>  2/23 Test  #2: simple-net-cpp ........................***Exception: Other  6.68 sec
>       Start  3: api-c
>  3/23 Test  #3: api-c .................................***Exception: Other  0.25 sec
>       Start  4: test_c_symbols-c
>  4/23 Test  #4: test_c_symbols-c ......................   Passed    0.01 sec
>       Start  5: test_sum
>  5/23 Test  #5: test_sum ..............................   Passed    1.98 sec
>       Start  6: test_reorder
>  6/23 Test  #6: test_reorder ..........................   Passed    0.59 sec
>       Start  7: test_concat
>  7/23 Test  #7: test_concat ...........................   Passed    0.57 sec
>       Start  8: test_relu_forward
>  8/23 Test  #8: test_relu_forward .....................***Failed    0.26 sec
>       Start  9: test_relu_backward
>  9/23 Test  #9: test_relu_backward ....................***Failed    0.25 sec
>       Start 10: test_lrn_forward
> 10/23 Test #10: test_lrn_forward ......................***Failed    1.60 sec
>       Start 11: test_lrn_backward
> 11/23 Test #11: test_lrn_backward .....................***Failed    1.98 sec
>       Start 12: test_pooling_forward
> 12/23 Test #12: test_pooling_forward ..................***Failed    5.49 sec
>       Start 13: test_pooling_backward
> 13/23 Test #13: test_pooling_backward .................   Passed    2.19 sec
>       Start 14: test_batch_normalization_forward
> 14/23 Test #14: test_batch_normalization_forward ......***Failed   10.12 sec
>       Start 15: test_batch_normalization_backward
> 15/23 Test #15: test_batch_normalization_backward .....***Failed   15.45 sec
>       Start 16: test_inner_product_forward
> 16/23 Test #16: test_inner_product_forward ............   Passed    2.47 sec
>       Start 17: test_inner_product_backward_data
> 17/23 Test #17: test_inner_product_backward_data ......   Passed    0.87 sec
>       Start 18: test_inner_product_backward_weights
> 18/23 Test #18: test_inner_product_backward_weights ...   Passed    1.41 sec
>       Start 19: test_convolution_format_any
> 19/23 Test #19: test_convolution_format_any ...........   Passed    0.02 sec
>       Start 20: test_convolution_forward
> 20/23 Test #20: test_convolution_forward ..............***Failed   56.88 sec
>       Start 21: test_convolution_relu_forward
> 21/23 Test #21: test_convolution_relu_forward .........***Failed   40.96 sec
>       Start 22: test_convolution_backward_data
> 22/23 Test #22: test_convolution_backward_data ........   Passed   87.59 sec
>       Start 23: test_convolution_backward_weights
> 23/23 Test #23: test_convolution_backward_weights .....   Passed   90.65 sec
> 
> 48% tests passed, 12 tests failed out of 23
> 
> Total Test time (real) = 328.49 sec
> 
> The following tests FAILED:
>           1 - simple-net-c (OTHER_FAULT)
>           2 - simple-net-cpp (OTHER_FAULT)
>           3 - api-c (OTHER_FAULT)
>           8 - test_relu_forward (Failed)
>           9 - test_relu_backward (Failed)
>          10 - test_lrn_forward (Failed)
>          11 - test_lrn_backward (Failed)
>          12 - test_pooling_forward (Failed)
>          14 - test_batch_normalization_forward (Failed)
>          15 - test_batch_normalization_backward (Failed)
>          20 - test_convolution_forward (Failed)
>          21 - test_convolution_relu_forward (Failed)
> Errors while running CTest
> make: *** [test] Error 8

Dave

MKL-DNN vs MKL2017 performance difference

Dear MKL DNN developers,

Do you know the cause of performance difference between mkl2017 and mkl-dnn on resnet-152?
In comparison to the mkl-dnn and mkl2017 performance by intel caffe, I have confirmed the performance of mkl2017 was 10 times better than that of mkl-dnn. I am wondering if the problem with avx512 is not applied well. Actually, I used xeon-phi 7250 CPU.

I desperately want to improve mkl-dnn to the performance of mkl2017.

Thank you,
Daejin.

padR in Convolution primitive

Hi ,

During the creation of convolution primitive descriptor(convolution_forward::desc)
why do we need the padR which is passed as input arguement.
std::vector padR = { cd.padh, cd.padw };
for (int i = 0; i < 2; ++i) {
if ((cd.ih + cd.padh + padR[0] - cd.kh)/cd.strh + 1 != cd.oh) ++padR[0];
if ((cd.iw + cd.padw + padR[1] - cd.kw)/cd.strw + 1 != cd.ow) ++padR[1];
}

Since padding information is already passed before the above input argument in the argument list.

Thanks,
G. Praveen.

Typo in Readme

Please fix the type in the readme:
(codename Kingts Landing)

inner_product_test_float.TestsInnerProduct tests fail

Hi,

While recently building and installing on a Haswell node (Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz)
The following tests fail:

TestInnerProductForward/inner_product_test_float.TestsInnerProduct/0 and TestInnerProductForward/inner_product_test_float.TestsInnerProduct/1 fail on
[==========] 364 tests from 44 test cases ran. (396301 ms total)
[ PASSED ] 362 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] TestInnerProductForward/inner_product_test_float.TestsInnerProduct/0, where GetParam() = 44-byte object <40-00 00-00 01-00 00-00 05-00 00-00 09-00 00-00 03-00 00-00 04-00 00-00 02-00 00-00 20-00 00-00 30-00 00-00 06-00 00-00 06-00 00-00>
[ FAILED ] TestInnerProductForward/inner_product_test_float.TestsInnerProduct/1, where GetParam() = 44-byte object <40-00 00-00 01-00 00-00 07-00 00-00 07-00 00-00 03-00 00-00 04-00 00-00 02-00 00-00 20-00 00-00 30-00 00-00 06-00 00-00 06-00 00-00>

Convolution Performance versus MKL 2017

What is the performance of mkl-dnn convolution versus the MKL 2017 implementation (say for a 3x3 convolution)?

I saw that the Update 2 of MKL 2017 got "Significantly improved reference convolution code performance". Does this library also benefit from these improvements? Or does one pay a performance penalty using this library?

Inference performance of bvlc_alexnet is far more slower on mkl-dnn

I found the inference performance of bvlc_alexnet is far more slower on mkl-dnn.
In intel-caffe build with mkl-dnn engine, I run the caffe/examples/cpp_classification example. And collect the elapsed time of the line: net_->Forward();
It's about 835 ms.
But the result of intel-caffe with mkl-2017 is 16 ms.
Then I add the following code to caffe/examples/cpp_classification/classification.cpp

 boost::posix_time::ptime start_cpu_;
 boost::posix_time::ptime stop_cpu_;
//first time
 start_cpu_= boost::posix_time::microsec_clock::local_time();
 net_->Forward();
 top_cpu_ = boost::posix_time::microsec_clock::local_time();
 double first_time = (stop_cpu_ - start_cpu_).total_milliseconds();

// second time
 start_cpu_= boost::posix_time::microsec_clock::local_time();
 net_->Forward();
 stop_cpu_ = boost::posix_time::microsec_clock::local_time();
 double second_time = (stop_cpu_ - start_cpu_).total_milliseconds();

The result is:
first time: 835 ms
second time: 15.32 ms


It's wired there is huge gape between first time and second time.
In intel-caffe with mkl2017, the result is
first time: 18 ms
second time: 16 ms

I collected each layer's forward time for the first inference.
I found the time is wasted in the first dropout layer. In

caffe/src/caffe/mkldnn_memory.cpp => MKLDNNMemoryDescriptor<Dtype, is_diff>::on_to_cpu()
=> StreamHolder::Instance().current_stream()->wait();

The first dropout layer is connected behind a fc(full connected) layer.

I've tried other models, the result of bvlc_reference_caffenet , vgg_16, vgg_19 is similar with bvlc_alexnet.
They all have a dropout layer behind a fc layer.
But bvlc_googlenet do not have large gape between first and second time.
The dropout layer of bvlc_googlenet is not connected behind a fc layer.
Is this an known issue?


Here is my configuration:
CPU: i7-4770 @ 3.40GHz x 8
Memory: 8G
ubuntu 14.04
intel caffe latest
mkl-dnn: commit: 47bda95
(latest mkl-dnn can not work with latest intel caffe, so I can not reproduce this in latest mkl-dnn)
test image: caffe/examples/images/cat.jpg


Windows or OSX Support?

Am I correct in interpreting this to have no support for Windows or Mac OSX at the moment? If this is correct, what are the time frames for adding support?

Benchmark results?

Hi,

Thanks for putting together this library. Is there any benchmark results for the performance of this library that can be shared? Thank you.

-sz

make test stalled

Hi,
I built the mkl-dnn but make test wont run.
Running tests...
Test project /home/gaurav/GPraveen/mkl-dnn/build
Start 1: simple-net-c
It is halting with the above error.

Intel processor that my PC has in Intel(R) Core(TM) i3-2120 CPU @ 3.30GHz with out avx2.
Is this the issue will it work only for intel Xeon processors that they have mentioned?

Thanks in advance

weight or src at simple_net?

I think I found an error at line:89 of examples/simple_net.cpp.
Maybe it should be conv_prim_desc.weights_primitive_desc(), not conv_prim_desc.src_primitive_desc().

What's weird is that both can pass the test, why?

Maxpooling issue

Hi,

I am trying to use the maxpooling for MNIST digit classification.
I am using the reorder to convert ,input from nchw to nChw8c format and then doing the Maxpooling considering output to be in nChw8c format , and then doing output reorder from nChw8c to nchw.

Here I am unable to get the desired output for below dimensions
for input[1,20,25,25] , max pool 2,2 :stride 2,2 pad 0,0

However if I consider nchw format for input/ouput without reorder I am getting the desired result.
Please suggest if I am missing anything.

Thanks,
G.Praveen.

AVX2 functions not called ... clarification needed

Hello

We are using latest intel mkl-dnn from the git hub. We are compiling and testing performance on a Ubuntu 14.04LTE PC which has AVX2 support. However, it appears that the AVX2 functions are not being invoked (appears that it is invoking _ref_convolution_fwd_t in ref_convolution.cpp instead of _jit_avx2_convolution_fwd_t in jit_avx2_convolution.cpp). Memory format used is NCHW.

However, it appears that the AVX2 functions are being used if Memory format for Weights/Inputs/Outputs is changed to other formats such as OIhw8i8o etc.

As we cannot rearrange the order of inputs/weights run-time, how to ensure that AVX2 functions will be called irrespective for memory format without impacting the accuracy.

Thanks in advance
Praveen

BatchNorm layer failed with width==height==1

Found the gtest failed with BN layer when manually setting h==w==1

Line 406 at tests/gtests/test_batch_normalization.cpp

INST_TEST_CASE(Simple_NCHW,
    PARAMS(nchw, nchw, 2, 10, 1, 1, EPS)
);

Program hung on MKL-DNN

Hi, I run my program on KNL and find it hung at src/cpu/jit_gemm_convolution_utils.cpp: line 158:

        for (size_t i = 0; i < jcp.im2col_size; ++i) (*ws)[i] = 0.;

Seems it use avx512 to do this.
In my terms, I just want to pass this hung point,
Will there be any issue of MKL-DNN if I comment this line since it just initialize the memory as 0 ?

What are the advantages of using a JIT for several operators ?

Hi,

MKL-DNN is using xbyak to generate the codes for some of the operators (conv, pooling, etc) in every forward pass. I wonder what are the advantages here by doing this.

To get optimal performance by hand coding the implementation in asm ? How much is the gain compare to implementing it by using C++ and SIMD instructions ? Why does it generate the codes in every run ?

Thanks!

Differences with MKL 2017 DNN

What are the differences between this version, and DNN primitives in MKL 2017? I see that the interfaces are different.

How to use mkl-dnn?

Didn't dig much deeper into the code, just go through the Readme,
my question is how to use mkl-dnn? Same as using MKL with caffe by defining BLAS := mkl ?

btw any any benchmark result (mkl-dnn vs original mkl)?

performance drop with GooglenetV1

Before v0.5 I used padR=padding in pooling layer, it works well and got a very good perf with googlenet scoring.
However with this new release, there is a check point in Pooling.cpp:

 for (int i = 2; i <= 3; ++i)
        consistency = consistency && (
                (src_desc->dims[i] - kernel[i - 2] + padding_l[i - 2]
                 + padding_r[i - 2]) / strides[i - 2] + 1
                == dst_desc->dims[i]);
    if (!consistency) return invalid_arguments;

So I have to use padR just like the test case:

for (int i = 0; i < 2; ++i) {
        if ((pd.ih + pd.padh + padR[0] - pd.kh)/pd.strh + 1 < pd.oh) ++padR[0];
        if ((pd.iw + pd.padw + padR[1] - pd.kw)/pd.strw + 1 < pd.ow) ++padR[1];
 }

Then the fps drops to HALF before.
And there are about 4 pooling cases in GooglenetV1 this padR would be used, taking much more time than before.

I thought the output memory would be clear to zero, before pooling right? So even if I use padding(instead of padR), the loss result was still right with much better performance.

PS: padding_kind::zero

So any possible improvement ?

Plans for RNN

Are there any plans to add RNN layers (compatible with the cuDNN RNN layers)? This would be exceptionally useful, given the wide usage of RNN's.

simplenet.cpp has incorrect pooling layer

In call to pooling_forward, pool_indices_memory should appear as 4th parameter.

net.push_back(pooling_forward(pool_pd, lrn_dst_memory, pool_indices_memory, 
    pool_dst_memory));                                                      

test_convolution_format_any failed

Hi, I just installed MKL-DNN on KNL but the test_convolution_format_any didn't pass, any ideas what went wrong?

(py27) linpengt@knl-desk:~/workspace/cgt/mkl-dnn/build$  CC=icc CXX=icpc cmake .. &&  make -j

(py27) linpengt@knl-desk:~/workspace/cgt/mkl-dnn/build$ OMP_NUM_THREADS=60 make -j test
Running tests...
Test project /home/linpengt/workspace/cgt/mkl-dnn/build
      Start  1: simple-net-c
 1/22 Test  #1: simple-net-c ..........................   Passed    2.92 sec
      Start  2: simple-net-cpp
 2/22 Test  #2: simple-net-cpp ........................   Passed    4.56 sec
      Start  3: api-c
 3/22 Test  #3: api-c .................................   Passed    0.11 sec
      Start  4: test_c_symbols-c
 4/22 Test  #4: test_c_symbols-c ......................   Passed    0.01 sec
      Start  5: test_sum
 5/22 Test  #5: test_sum ..............................   Passed    0.17 sec
      Start  6: test_reorder
 6/22 Test  #6: test_reorder ..........................   Passed    0.86 sec
      Start  7: test_concat
 7/22 Test  #7: test_concat ...........................   Passed    0.10 sec
      Start  8: test_softmax_forward
 8/22 Test  #8: test_softmax_forward ..................   Passed    6.53 sec
      Start  9: test_relu
 9/22 Test  #9: test_relu .............................   Passed    0.72 sec
      Start 10: test_lrn_forward
10/22 Test #10: test_lrn_forward ......................   Passed    1.37 sec
      Start 11: test_lrn_backward
11/22 Test #11: test_lrn_backward .....................   Passed    2.99 sec
      Start 12: test_pooling_forward
12/22 Test #12: test_pooling_forward ..................   Passed    0.41 sec
      Start 13: test_pooling_backward
13/22 Test #13: test_pooling_backward .................   Passed    0.60 sec
      Start 14: test_batch_normalization
14/22 Test #14: test_batch_normalization ..............   Passed   11.43 sec
      Start 15: test_inner_product_forward
15/22 Test #15: test_inner_product_forward ............   Passed    0.26 sec
      Start 16: test_inner_product_backward_data
16/22 Test #16: test_inner_product_backward_data ......   Passed    0.12 sec
      Start 17: test_inner_product_backward_weights
17/22 Test #17: test_inner_product_backward_weights ...   Passed    0.11 sec
      Start 18: test_convolution_format_any
18/22 Test #18: test_convolution_format_any ...........***Failed    0.02 sec
      Start 19: test_convolution_forward

19/22 Test #19: test_convolution_forward ..............   Passed  366.04 sec
      Start 20: test_convolution_relu_forward

so many error when compiling on windows with visual studio

hello, I have compiled the source code on windows/visual studio after configured with cmake, there are so many errors, such as(there are more than 500 errors):

Severity Code Description Project File Line Suppression State
Error the global scope has no "posix_memalign" mkldnn
Severity Code Description Project File Line Suppression State
Error identifier "jit_conv_call_s" is undefined mkldnn
void operator()(jit_bnrm_call_s *arg) { jit_ker(arg); }
Severity Code Description Project File Line Suppression State
Error more than one conversion function from "lambda ->mkldnn::c_api::mkldnn_primitive_desc_t" to "" applies: simple-net-cpp C:\myproject\intel\caffe\external\mkldnn\include\mkldnn.hpp 594
............................................................

anyone compiled the source code successfully on windows? thanks.

I have installed the intel xe 2017(with c++ compiler) evaluation, and tried to compile it with intel c++, but still failed.

AVX512

Hi, Does this library is optimized using AVX-512 intrinsic?

AVX support patch

Dear MKL DNN developers,

Would you be interested in reviewing a patch which adds AVX support to convolution forward scoring?
With this patch on AVX machines, we see about 70-80% CPU utilization in single thread with 1x1 convolutions, and about 90% with regular convolutions.

Thank you,
Evgueni.

FC depends on OMP and cblas?

Is it possible that fc do not rely on them.
So it is more convenient that we may not be bothered by omp or iomp.

Stream usage for backpropagation or softmax usage?

We've been using MKL-DNN to implement a simple CNN, but we had a somewhat basic question about doing the softmax error calculation and backpropagation.

Should we use separate streams for different phases of execution, or do we add backpropagation primitives to our existing forward path stream? It's somewhat unclear from the unit tests in the codebase and there don't seem to be any relevant examples in the Intel caffe branch.

Thanks,
Jeff

nChw8c for relu

I found relu only support nchw and nc format input, right?

If previous layer's output is nChw8c will get uncorrect output, while nChw8c should have better perf than nchw.
(for conv layer I can use conv_relu instead, but when previous is batch norm we may got nChw8c output).

Is it possible that relu support nChw8c format, or give any to relu and return the best format?

If so, we can get better perf on Resnet.

fully connected layer optimal code

Hi,

For fully connected Layer I have used the sgemm, cblas_sgemm and I got some reasonable number x ms.
When I tried to use the innerproduct available for fullyconnected from mkldnn I am getting more then x ms.

Please suggest if I am missing anything here . Ideally the innerproduct which internally calls avx2 gemm should be less than xms?

Thanks,
Praveen.

segmentation fault (core dumped)

HI,
I successfully completed 'make test' and 'make install' after fixed error (https://github.com/01org/mkl-dnn/issues/40).
but , when i run intel/caffe train model by '--engine=MKLDNN' then occur segmentation fault error and stopped train.
If i run by '--engine=MKL2017' then no error.

[error logs]
...
I0418 09:31:32.606019 104068 net.cpp:365] data does not need backward computation.
I0418 09:31:32.606083 104068 net.cpp:407] This network produces output loss
*** Aborted at 1492475492 (unix time) try "date -d @1492475492" if you are using GNU date ***
PC: @ 0x7f5b98ff4360 MLSL::Free()
*** SIGSEGV (@0x0) received by PID 104068 (TID 0x7f5b9c6e6a80) from PID 0; stack trace: ***
@ 0x7f5b9b1b5370 (unknown)
@ 0x7f5b98ff4360 MLSL::Free()
@ 0x7f5b9c161ceb caffe::Net<>::Init()
@ 0x7f5b9c162dae caffe::Net<>::Net()
@ 0x7f5b9c1314f2 caffe::Solver<>::InitTrainNet()
@ 0x7f5b9c131acc caffe::Solver<>::Init()
@ 0x7f5b9c131e38 caffe::Solver<>::Solver()
@ 0x7f5b9c0a92c3 caffe::Creator_SGDSolver<>()
@ 0x7f5b9c75c787 caffe::SolverRegistry<>::CreateSolver()
@ 0x7f5b9c753b5b train()
@ 0x7f5b9c74d9dc main
@ 0x7f5b90597b35 __libc_start_main
@ 0x7f5b9c74e81d (unknown)
segmentation fault (core dumped)

make test failed.

Hi.

i use 'Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz'

The following tests FAILED:
1 - simple-net-c (OTHER_FAULT)
2 - simple-net-cpp (OTHER_FAULT)
3 - simple-training-net-c (SEGFAULT)
Errors while running CTest

another passed

[ this is logs]
Running tests...
Test project /home/framework/mkl-dnn/build
Start 1: simple-net-c
*** Error in /home/framework/mkl-dnn/build/examples/simple-net-c': free(): invalid pointer: 0x00007faaee350010 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x7c503)[0x7fab3b6a6503] /home/framework/mkl-dnn/build/examples/simple-net-c(+0x23cc)[0x7fab42f423cc] /home/framework/mkl-dnn/build/examples/simple-net-c(+0x11ca)[0x7fab42f411ca] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab3b64bb35] /home/framework/mkl-dnn/build/examples/simple-net-c(+0x1239)[0x7fab42f41239] ======= Memory map: ======== 7faab66ca000-7faab66cb000 ---p 00000000 00:00 0 ... ... 1/24 Test #1: simple-net-c ..........................***Exception: Other 6.10 sec Start 2: simple-net-cpp *** Error in /home/framework/mkl-dnn/build/examples/simple-net-cpp': free(): invalid pointer: 0x00007f784172e040 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c503)[0x7f7884b6c503]
/home/framework/mkl-dnn/build/examples/simple-net-cpp(+0x70b9)[0x7f788cc2e0b9]
/home/framework/mkl-dnn/build/examples/simple-net-cpp(+0x4c76)[0x7f788cc2bc76]
/home/framework/mkl-dnn/build/examples/simple-net-cpp(+0x2a1d)[0x7f788cc29a1d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7884b11b35]
/home/framework/mkl-dnn/build/examples/simple-net-cpp(+0x2b3c)[0x7f788cc29b3c]
======= Memory map: ========
7f7809aa8000-7f7809aa9000 ---p 00000000 00:00 0
...
...
2/24 Test #2: simple-net-cpp ........................***Exception: Other 7.88 sec
Start 3: simple-training-net-c
3/24 Test #3: simple-training-net-c .................***Exception: SegFault 32.32 sec
Start 4: simple-training-net-cpp
4/24 Test #4: simple-training-net-cpp ............... Passed 13.11 sec
Start 5: api-c
...

$lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.3.1611 (Core)
Release: 7.3.1611
Codename: Core

$gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thank you.

Issue when "make install"

hi ,

I followed the steps to install MKLDNN. However, in the last step when I want to "make install". It showed the Error msg as below.

Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/lib/libmklml_intel.so
CMake Error at cmake_install.cmake:36 (FILE):
file INSTALL cannot copy file
"/home/curly/mkl-dnn/external/mklml_lnx_2017.0.1.20161005/lib/libmklml_intel.so"
to "/usr/local/lib/libmklml_intel.so".

can anyone helps me?

Modify Cmake to build and link libmkldnn.so only to intel OpenMP runtime when compiled with gcc

Currently when mkldnn is compiled with gcc, libmkldnn.so shows dependency on both intel OpenMP and gnu OpenMP runtimes as below.

(.venv2) phthoreho@phthoreho-X9DAi:/media/phthoreho/61279ac5-fee9-489e-8593-4891a74aa922/MKL-DNN/mkl-dnn/build$ ldd /media/phthoreho/disk1/MKL-DNN/temp/lib/libmkldnn.so
	linux-vdso.so.1 =>  (0x00007ffddd6d1000)
	libmklml_intel.so => /usr/local/lib/libmklml_intel.so (0x00007f432aaa8000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f432a6f6000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f432a3ed000)
	libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f432a1be000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4329fa7000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4329bde000)
	libiomp5.so => /opt/intel/compilers_and_libraries_2017.2.174/linux/compiler/lib/intel64/libiomp5.so (0x00007f432983b000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f432961e000)
	/lib64/ld-linux-x86-64.so.2 (0x00005654b69de000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4329419000)

mkl-dnn engine crash on cpu without avx2 support

It's known that the mkl-dnn engine not support machines without avx2.
But the engine will crash with "illegal instruction“.
Users will not happy to see the program crash.

Rather than crash, should the engine still work on machine without avx2 support ?
Maybe use other instruction set instead of avx2 when no avx2 support ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.