Comments (4)
Thanks for pointing this out. This benchmark is a little bit outdated and i just renewed it.
You can see perf updates from the README.md and by the way i suggest you switch to latest PyTorch (binary release or nightly build both OK).
Also for inference, it is recommended to use jemalloc
or tcmalloc
with numactrl
on a single numa node (which means a single socket for Intel Xeon CPU), you can take run.sh
as a reference.
the _mkldnn
layout means that both activation (input and output) and feature maps (weights) will be in mkldnn blocked format during the propagation. this plan has a flaw that in case the network comprises an operator that does not support mkldnn blocked format, you have to convert it back to plain format manually. So somewhat unfriendly to use...
we have be working on channels last (nhwc) format and hopefully this will be ready soon. In CL format, activation will always be plain format (nhwc) so you don't have to worry about the blocked-plain format conversion in case of a mkldnn-non-supported operator. the idea is here. i will renew this benchmark once this is fully available.
from convnet-benchmark-py.
Thanks for your quick replay. I rerun this benchmark for resnext101. Here is my informations:
Running on device: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz
Running on torch: 1.6.0+cpu
Running on torchvision: 0.7.0+cpu
tcmalloc | numactl | mkldnn | jit | (ms)
× | × | × | × | 124
√ | × | × | × | 150
× | √ | × | × | 97
√ | √ | × | × | 121
× | × | √ | × | 120
× | √ | √ | × | 97
× | √ | √ | √ | 97
√ | √ | √ | √ | 121
I can find that only numactl useful, tcmalloc harmful, mkldnn and jit still useless.
Did u build torch 1.7.0a0+7cc6540 from source code by intel c compiler?(My torch was installed by pip from 1.6.0 release version) Can u give some advise to help me find out why mkldnn and jit useless in my situation?
from convnet-benchmark-py.
I need more info to debug this issue. Please test --mkldnn
on and off with mkldnn verbose enabled:
MKLDNN_VERBOSE=2 ./run.sh --inference --single
MKLDNN_VERBOSE=2 ./run.sh --inference --single --mkldnn
The verbose will print every execution timing info of mkldnn operators, so the log is somehow overwhelming. You can reduce the number of iterations, say 10, so as to reduce the size of the final log. If the log is still too long, please send me by email [email protected]
By the way, this issue should have nothing to do with compiler (icc doesn't compile yet). I tried public release '1.6.0+cpu' and still got similar result as my local '1.7.0a0+7cc6540'.
from convnet-benchmark-py.
@mingfeima You're so kind to help me. My issue is not really using mkldnn flag. Then I can get the results as expected.Thanks again!
from convnet-benchmark-py.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from convnet-benchmark-py.