Comments (5)
for reference the error is
2024-05-31T23:12:43.7830534Z =========================== short test summary info ============================
2024-05-31T23:12:43.7832053Z FAILED test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_1_cpu - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-05-31T23:12:43.7833246Z CppCompileError: C++ compile error
2024-05-31T23:12:43.7833517Z
2024-05-31T23:12:43.7833621Z Command:
2024-05-31T23:12:43.7840753Z g++ /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -D_GLIBCXX_USE_CXX11_ABI=0 -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/TH -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/THC -I/opt/conda/envs/venv/include/python3.9 -L/opt/conda/envs/venv/lib/python3.9/site-packages/torch/lib -L/opt/conda/envs/venv/lib -L/opt/conda/envs/venv/lib/python3.9/site-packages/torch/lib -ltorch -ltorch_cpu -lgomp -ltorch_python -lc10 -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -DCPU_CAPABILITY_AVX512 -O3 -DNDEBUG -ffast-math -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -march=native -fopenmp -D C10_USING_CUSTOM_GENERATED_MACROS -o /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.so
2024-05-31T23:12:43.7847453Z
2024-05-31T23:12:43.7847561Z Output:
2024-05-31T23:12:43.7848972Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp: In function ‘void kernel(half*, const half*, const int8_t*, const int64_t*, const half*, half*, half*, half*, long int)’:
2024-05-31T23:12:43.7851691Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:36: error: no match for ‘operator*’ (operand types are ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ and ‘at::vec::CPU_CAPABILITY::Vectorized<float>’)
2024-05-31T23:12:43.7853239Z 77 | auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7853703Z | ~~~~~ ^ ~~~~~
2024-05-31T23:12:43.7854122Z | | |
2024-05-31T23:12:43.7854580Z | | Vectorized<float>
2024-05-31T23:12:43.7855061Z | Vectorized<int>
2024-05-31T23:12:43.7856049Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:8,
2024-05-31T23:12:43.7857334Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7858557Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7859815Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7861053Z from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7862212Z from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7865215Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:629:41: note: candidate: ‘template<class T> at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&)’
2024-05-31T23:12:43.7867719Z 629 | template <class T> Vectorized<T> inline operator*(const Vectorized<T> &a, const Vectorized<T> &b) {
2024-05-31T23:12:43.7868487Z | ^~~~~~~~
2024-05-31T23:12:43.7869739Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:629:41: note: template argument deduction/substitution failed:
2024-05-31T23:12:43.7871736Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:38: note: deduced conflicting types for parameter ‘T’ (‘int’ and ‘float’)
2024-05-31T23:12:43.7872918Z 77 | auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7873378Z | ^~~~~
2024-05-31T23:12:43.7874349Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1132,
2024-05-31T23:12:43.7875777Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:8,
2024-05-31T23:12:43.7876969Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7878188Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7879447Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7880709Z from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7881848Z from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7884406Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:316:37: note: candidate: ‘template<class T, int N> at::vec::CPU_CAPABILITY::VectorizedN<T, N> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&, const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&)’
2024-05-31T23:12:43.7886385Z 316 | VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL(operator*)
2024-05-31T23:12:43.7886877Z | ^~~~~~~~
2024-05-31T23:12:43.7888170Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:297:28: note: in definition of macro ‘VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL’
2024-05-31T23:12:43.7889512Z 297 | inline VectorizedN<T, N> op( \
2024-05-31T23:12:43.7890106Z | ^~
2024-05-31T23:12:43.7891273Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:316:37: note: template argument deduction/substitution failed:
2024-05-31T23:12:43.7892428Z 316 | VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL(operator*)
2024-05-31T23:12:43.7892924Z | ^~~~~~~~
2024-05-31T23:12:43.7894198Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:297:28: note: in definition of macro ‘VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL’
2024-05-31T23:12:43.7895523Z 297 | inline VectorizedN<T, N> op( \
2024-05-31T23:12:43.7896112Z | ^~
2024-05-31T23:12:43.7897781Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:38: note: ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ is not derived from ‘const at::vec::CPU_CAPABILITY::VectorizedN<T, N>’
2024-05-31T23:12:43.7899201Z 77 | auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7899663Z | ^~~~~
2024-05-31T23:12:43.7900660Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:10,
2024-05-31T23:12:43.7901934Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7903154Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7904587Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7921001Z from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7922202Z from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7925013Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:790:29: note: candidate: ‘at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&)’
2024-05-31T23:12:43.7927404Z 790 | Vectorized<BFloat16> inline operator*(const Vectorized<BFloat16>& a, const Vectorized<BFloat16>& b) {
2024-05-31T23:12:43.7928166Z | ^~~~~~~~
2024-05-31T23:12:43.7930114Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:790:67: note: no known conversion for argument 1 from ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ to ‘const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&’
2024-05-31T23:12:43.7932167Z 790 | Vectorized<BFloat16> inline operator*(const Vectorized<BFloat16>& a, const Vectorized<BFloat16>& b) {
2024-05-31T23:12:43.7933196Z | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
2024-05-31T23:12:43.7935472Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:1389:25: note: candidate: ‘at::vec::CPU_CAPABILITY::Vectorized<c10::Half> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&)’
2024-05-31T23:12:43.7937710Z 1389 | Vectorized<Half> inline operator*(const Vectorized<Half>& a, const Vectorized<Half>& b) {
2024-05-31T23:12:43.7938422Z | ^~~~~~~~
2024-05-31T23:12:43.7940336Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:1389:59: note: no known conversion for argument 1 from ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ to ‘const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&’
2024-05-31T23:12:43.7942298Z 1389 | Vectorized<Half> inline operator*(const Vectorized<Half>& a, const Vectorized<Half>& b) {
2024-05-31T23:12:43.7943051Z | ~~~~~~~~~~~~~~~~~~~~~~~~^
2024-05-31T23:12:43.7943515Z
2024-05-31T23:12:43.7943536Z
2024-05-31T23:12:43.7943823Z Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
2024-05-31T23:12:43.7944276Z
2024-05-31T23:12:43.7944280Z
2024-05-31T23:12:43.7944583Z You can suppress this exception and fall back to eager by setting:
2024-05-31T23:12:43.7945129Z import torch._dynamo
2024-05-31T23:12:43.7945526Z torch._dynamo.config.suppress_errors = True
2024-05-31T23:12:43.7946971Z FAILED test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_2_cpu - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-05-31T23:12:43.7948162Z CppCompileError: C++ compile error
from pytorch.
BF16 passes but FP16 failed. From the lowering graph, we have saw the graph is already different between BF16 and FP16, is it a CPU specific issue @jerryzh168 ?
- Here is the BF16 fx graph readable. We convert tensor to BF16 after
clamp_min
![image](https://private-user-images.githubusercontent.com/53841472/338517578-d0175c16-b185-47b1-a61f-a99a225e7fb9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgyMjk2OTIsIm5iZiI6MTcxODIyOTM5MiwicGF0aCI6Ii81Mzg0MTQ3Mi8zMzg1MTc1NzgtZDAxNzVjMTYtYjE4NS00N2IxLWE2MWYtYTk5YTIyNWU3ZmI5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEyVDIxNTYzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTEzN2U2OGRiOTJmNDUyYTRjMDY0OWQwNGE5ZDZkZDY3YWI2MjZiYTExOWMzZjFiNDNmZjUxNzBmNjdjMWU5ZWYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.v04bbIbgmPKZFwMCzgvtUsn3XanPMAMn-eTTBxodo3M)
- Here is the FP16 fx graph readable. We convert tensor to FP32 after
clamp_min
![image](https://private-user-images.githubusercontent.com/53841472/338517961-c26c8285-182f-4bbd-b3c7-7e0f05889b24.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgyMjk2OTIsIm5iZiI6MTcxODIyOTM5MiwicGF0aCI6Ii81Mzg0MTQ3Mi8zMzg1MTc5NjEtYzI2YzgyODUtMTgyZi00YmJkLWIzYzctN2UwZjA1ODg5YjI0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEyVDIxNTYzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg4MjRhZWQ2NGY0YmUzOTMyOTdmMTAyMjQyMjlmZDZhZjQ3NzFjMWYwOTE2NGQ5ZjViMzg0OWRjMTE4MmQ2ZTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.WF6YKPu5YMxPgWKjZHkC379vkB36wwGAo5nNnh2yU7c)
from pytorch.
This line of code is suspicious in TorchAO https://github.com/pytorch/ao/blob/950a89388e88e10f26bbbbe2ec0b1710ba3d33d1/torchao/quantization/quant_api.py#L413, which hardcode the data type as None
for BF16
, but FP32
for for FP16
.
==============
Update
Unify the data type to None
for both FP16 and BF16, this testcase passes on my local system.
from pytorch.
this fails for both CPU and CUDA I think.
the linked detail is important to not regress the performance for some internal model I think, why can't inductor support this path?
from pytorch.
Thanks for the remind, after further investigation, we do found a CPP Backend issue, #128498 to fix it. With this PR, I think this testcase works with CPP Backend now.
from pytorch.
Related Issues (20)
- 'make html' will print 'duplicate object description' warnings when there are 1~5 CPUs in the running machine
- torch.jit.script not work for ParameterDict.items HOT 1
- [bug]When the variable in the dataset is of type cuda, calling the dataloader will result in 0 HOT 1
- torch.onnx.export with dynamic axes fails for torch.nn.InstanceNorm1d with track_running_stats=True
- [MITIGATED] Migration to Amazon Linux 23 - issues with nvidia driver
- crash@sleef_tryVXE2 () while trying to run torch.compile() BERT model HOT 8
- torch.onnx.export - `repeat_interleave` produces invalid model HOT 1
- [inductor][cpu] AMP models static/dynamic default/CPP wrapper accuracy/performance crash in 2024-06-08 nightly release HOT 3
- elements of STL get converted to an `IValue` and cannot be converted to an STL.
- [inductor][cpu]LayoutLMForSequenceClassification AMP single threadstatic/dynamic shape default/CPP wrapper accuracy failure HOT 6
- [inductor][cpu]hf_BigBird AMP multiple thread static/dynamic shape default/CPP wrapper performance regression HOT 4
- subscript on top of range not handled correctly in dynamo HOT 2
- test/dynamo/test_torchrec.py doesn't run in OSS CI (and won't run locally) HOT 1
- torch.library.custom_op doesn't work with multithreading
- torch.library.custom_op's needs_input_grad is wrong with TensorList inputs
- [torch.compile] Llama2 failure using dynamic shapes with Torch 2.4 nightly HOT 1
- DISABLED test_segfault (__main__.TestDataLoader) HOT 3
- aot_export_module produces unserializable GraphModules
- Dynamo graph breaks when passing a constant value to a nn module initialized within a function
- MaskedTensor do not support _is_any_true`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.