I run the for deploying resnext101, but I can't get any gains with mkldnn and c

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No benifits for deploying resnext101 about convnet-benchmark-py HOT 4 CLOSED

LeonWlw commented on August 26, 2024

No benifits for deploying resnext101

from convnet-benchmark-py.

Comments (4)

mingfeima commented on August 26, 2024

Thanks for pointing this out. This benchmark is a little bit outdated and i just renewed it.
You can see perf updates from the README.md and by the way i suggest you switch to latest PyTorch (binary release or nightly build both OK).
Also for inference, it is recommended to use jemalloc or tcmalloc with numactrl on a single numa node (which means a single socket for Intel Xeon CPU), you can take run.sh as a reference.

the _mkldnn layout means that both activation (input and output) and feature maps (weights) will be in mkldnn blocked format during the propagation. this plan has a flaw that in case the network comprises an operator that does not support mkldnn blocked format, you have to convert it back to plain format manually. So somewhat unfriendly to use...

we have be working on channels last (nhwc) format and hopefully this will be ready soon. In CL format, activation will always be plain format (nhwc) so you don't have to worry about the blocked-plain format conversion in case of a mkldnn-non-supported operator. the idea is here. i will renew this benchmark once this is fully available.

from convnet-benchmark-py.

LeonWlw commented on August 26, 2024

Thanks for your quick replay. I rerun this benchmark for resnext101. Here is my informations:

Running on device: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
Running on torch: 1.6.0+cpu
Running on torchvision: 0.7.0+cpu

tcmalloc | numactl | mkldnn | jit | (ms)
× | × | × | × | 124
√ | × | × | × | 150
× | √ | × | × | 97
√ | √ | × | × | 121
× | × | √ | × | 120
× | √ | √ | × | 97
× | √ | √ | √ | 97
√ | √ | √ | √ | 121

I can find that only numactl useful, tcmalloc harmful, mkldnn and jit still useless.
Did u build torch 1.7.0a0+7cc6540 from source code by intel c compiler?(My torch was installed by pip from 1.6.0 release version) Can u give some advise to help me find out why mkldnn and jit useless in my situation？

from convnet-benchmark-py.

mingfeima commented on August 26, 2024

I need more info to debug this issue. Please test --mkldnn on and off with mkldnn verbose enabled:

MKLDNN_VERBOSE=2 ./run.sh --inference --single
MKLDNN_VERBOSE=2 ./run.sh --inference --single --mkldnn

The verbose will print every execution timing info of mkldnn operators, so the log is somehow overwhelming. You can reduce the number of iterations, say 10, so as to reduce the size of the final log. If the log is still too long, please send me by email [email protected]

By the way, this issue should have nothing to do with compiler (icc doesn't compile yet). I tried public release '1.6.0+cpu' and still got similar result as my local '1.7.0a0+7cc6540'.

from convnet-benchmark-py.

LeonWlw commented on August 26, 2024

@mingfeima You're so kind to help me. My issue is not really using mkldnn flag. Then I can get the results as expected.Thanks again!

from convnet-benchmark-py.

No benifits for deploying resnext101 about convnet-benchmark-py HOT 4 CLOSED

Comments (4)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent