Comments (9)
The problem is actually being caused by NVRTC. I am not sure what NVRTC is setting as the result is that:
nl_langinfo(CODESET
returns ANSI_X3.4-1968
. This would explain why both NNC and nvFuser have the issue but TorchInductor does not as OAI-Triton compiles via LLVM-IR and not NVRTC.
I wrapped our specific call to nvrtcCompiileProgram
in executor_utils.cpp
and you can see the change with the following code:
char* locstr = setlocale(LC_CTYPE, NULL);
char* encoding = nl_langinfo(CODESET);
printf("1. Locale is %s\n", locstr);
printf("1. Encoding is %s\n", encoding);
I thought the simple fix would be to do setlocale(LC_CTYPE, "C.UTF-8")
, but that does not work. I am not sure what nl_langinfo
is seeing to determine that the locale is ascii.
I discovered this as I noticed from stepping pdb through Python 3.8 that the library code in Python 3.8 was calling locale.nl_langinfo(CODESET)
which the emulated code in Python 3.10 does not but I am guessing the Cpython does in Python 3.10.
This issue with NVRTC and nvrtcCompileProgram
was also noticed on Stack Overlow.
There is also an associated NvBug 3833924
from fuser.
Do you know how to get this encoding in C++? locale.getpreferredencoding()
from fuser.
From the documentation, it looks like if you use a default constructor on std::locale
, it copies the global locale.
std::locale loc;
from fuser.
This definitely appears to be coming from codegen as I tried asserting around the FusionDefinition
and things were fine after initially creating the unscheduled Fusion IR. I don't see us setting std::locale::global
anywhere in our code. I do see a few places in Pytorch's third_party components but none of them look like they should impact nvFuser.
import locale
import torch
from nvfuser import FusionDefinition, DataType
H = 768
inputs = [
torch.randn(H, H, device="cuda"),
torch.randn(H, device="cuda"),
]
def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(symbolic_sizes=[-1, -1], contiguous=[True, True], dtype=DataType.Float, is_cpu=False)
T1 = fd.define_tensor(symbolic_sizes=[-1], contiguous=[True], dtype=DataType.Float, is_cpu=False)
T2 = fd.ops.broadcast_in_dim(T1, output_shape=[768, 768], broadcast_dims=[1])
T3 = fd.ops.add(T0, T2)
S4 = fd.define_constant(0.500000, dtype=DataType.Double)
T5 = fd.ops.mul(T3, S4)
S6 = fd.define_constant(0.797885, dtype=DataType.Double)
T7 = fd.ops.mul(T3, S6)
S8 = fd.define_constant(0.0447150, dtype=DataType.Double)
T9 = fd.ops.mul(T3, S8)
T10 = fd.ops.mul(T9, T3)
S11 = fd.define_constant(1.00000, dtype=DataType.Double)
T12 = fd.ops.add(T10, S11)
T13 = fd.ops.mul(T7, T12)
T14 = fd.ops.tanh(T13)
S15 = fd.define_constant(1.00000, dtype=DataType.Double)
T16 = fd.ops.add(T14, S15)
T17 = fd.ops.mul(T5, T16)
T18 = fd.ops.cast(T17, dtype=DataType.Float)
fd.add_output(T18)
print("ASSERT 1")
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
with FusionDefinition() as fd:
nvfuser_fusion_id0(fd)
print("ASSERT 2")
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
out = fd.execute(inputs)
print("ASSERT 3")
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
from fuser.
This problem seems very specific to Python 3.10 and how the internal function _get_locale_encoding()
is implemented. If I force the code down the emulated path, I don't see this problem. If it executes the C function that I can't find the implementation for, it fails. Note the code for python 3.9 and 3.11 is different!
https://github.com/python/cpython/blob/3.10/Lib/locale.py#L636-L647
def _get_locale_encoding():
if hasattr(sys, 'getandroidapilevel'):
# On Android langinfo.h and CODESET are missing, and UTF-8 is
# always used in mbstowcs() and wcstombs().
return 'UTF-8'
if sys.flags.utf8_mode:
return 'UTF-8'
encoding = getdefaultlocale()[1]
if encoding is None:
# LANG not set, default conservatively to ASCII
encoding = 'ascii'
return encoding
I think the C code is hitting this case but I am not sure why.
if encoding is None:
# LANG not set, default conservatively to ASCII
encoding = 'ascii'
from fuser.
This is the test I used where I print the possible places to get the locale settings. Only getpreferredencoding()
is in python 3.10 is showing something different.
import locale
import torch
from nvfuser import FusionDefinition, DataType
import os
H = 768
inputs = [
torch.randn(H, H, device="cuda"),
torch.randn(H, device="cuda"),
]
def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(symbolic_sizes=[-1, -1], contiguous=[True, True], dtype=DataType.Float, is_cpu=False)
T1 = fd.define_tensor(symbolic_sizes=[-1], contiguous=[True], dtype=DataType.Float, is_cpu=False)
T2 = fd.ops.broadcast_in_dim(T1, output_shape=[768, 768], broadcast_dims=[1])
T3 = fd.ops.add(T0, T2)
S4 = fd.define_constant(0.500000, dtype=DataType.Double)
T5 = fd.ops.mul(T3, S4)
S6 = fd.define_constant(0.797885, dtype=DataType.Double)
T7 = fd.ops.mul(T3, S6)
S8 = fd.define_constant(0.0447150, dtype=DataType.Double)
T9 = fd.ops.mul(T3, S8)
T10 = fd.ops.mul(T9, T3)
S11 = fd.define_constant(1.00000, dtype=DataType.Double)
T12 = fd.ops.add(T10, S11)
T13 = fd.ops.mul(T7, T12)
T14 = fd.ops.tanh(T13)
S15 = fd.define_constant(1.00000, dtype=DataType.Double)
T16 = fd.ops.add(T14, S15)
T17 = fd.ops.mul(T5, T16)
T18 = fd.ops.cast(T17, dtype=DataType.Float)
fd.add_output(T18)
print("ASSERT 1", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
with FusionDefinition() as fd:
nvfuser_fusion_id0(fd)
print("ASSERT 2", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
out = fd.execute(inputs)
print("ASSERT 3", locale.getpreferredencoding(do_setlocale=True), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
Output:
print("ASSERT 1", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
with FusionDefinition() as fd:
nvfuser_fusion_id0(fd)
print("ASSERT 2", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
out = fd.execute(inputs)
print("ASSERT 3", locale.getpreferredencoding(do_setlocale=True), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
from fuser.
I think this is Python 3.10 problem and not an nvFuser problem, so I am closing!
from fuser.
I am observing this issue with Python 3.8 as well @kevinstephano
from fuser.
I am going to close this issue, again, as there isn't anything we can do besides monitor the NVRTC bug.
from fuser.
Related Issues (20)
- Merging IterDomains requires that their iteration types match. HOT 7
- Opportunistically encourage uniform data path for some integer scalars HOT 3
- Task formalism in our IR HOT 4
- CpAsync with shared memory predicate is not permitted
- [Python Benchmarks] Add a mode to selectively run a few inputs for functionality check
- `RuntimeError: Stride mismatch with contiguity info.` HOT 7
- Squeezed IterDomain ?S536{1} must concretize to IterType::Broadcast but found ?S536{1}. HOT 7
- Invalid input domain extent: 0 HOT 1
- DIDx aware ATen evaluation for matmul and linear. HOT 6
- pad propagation and replay issues. HOT 3
- group norm segmented into pointwise + persistent + pointwise HOT 17
- [obsolete issues]
- `getExprsBetween` is not getting expressions between vals HOT 5
- CUDA NVRTC compile error in `fd.ops.add` HOT 1
- failure in lightning-thunder ci HOT 3
- Almost Exact graph does not map non-size-one broadcast domains HOT 3
- RuntimeError: producer->getMemoryType() == MemoryType::Global HOT 3
- Tracking outer reduction CPP benchmark regressions HOT 2
- Confusing extra epilogue casting loop between lds and stg HOT 2
- Factor out validation from Expr subclass constructors
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fuser.