Git Product home page Git Product logo

Comments (9)

kevinstephano avatar kevinstephano commented on August 16, 2024 1

The problem is actually being caused by NVRTC. I am not sure what NVRTC is setting as the result is that:
nl_langinfo(CODESET returns ANSI_X3.4-1968. This would explain why both NNC and nvFuser have the issue but TorchInductor does not as OAI-Triton compiles via LLVM-IR and not NVRTC.

I wrapped our specific call to nvrtcCompiileProgram in executor_utils.cpp and you can see the change with the following code:

   char* locstr = setlocale(LC_CTYPE, NULL);
   char* encoding = nl_langinfo(CODESET);
   printf("1. Locale is %s\n", locstr);
   printf("1. Encoding is %s\n", encoding);

I thought the simple fix would be to do setlocale(LC_CTYPE, "C.UTF-8"), but that does not work. I am not sure what nl_langinfo is seeing to determine that the locale is ascii.

I discovered this as I noticed from stepping pdb through Python 3.8 that the library code in Python 3.8 was calling locale.nl_langinfo(CODESET) which the emulated code in Python 3.10 does not but I am guessing the Cpython does in Python 3.10.

This issue with NVRTC and nvrtcCompileProgram was also noticed on Stack Overlow.

There is also an associated NvBug 3833924

from fuser.

naoyam avatar naoyam commented on August 16, 2024

Do you know how to get this encoding in C++? locale.getpreferredencoding()

from fuser.

kevinstephano avatar kevinstephano commented on August 16, 2024

From the documentation, it looks like if you use a default constructor on std::locale, it copies the global locale.

std::locale loc;

from fuser.

kevinstephano avatar kevinstephano commented on August 16, 2024

This definitely appears to be coming from codegen as I tried asserting around the FusionDefinition and things were fine after initially creating the unscheduled Fusion IR. I don't see us setting std::locale::global anywhere in our code. I do see a few places in Pytorch's third_party components but none of them look like they should impact nvFuser.

import locale
import torch
from nvfuser import FusionDefinition, DataType

H = 768
inputs = [
    torch.randn(H, H, device="cuda"),
    torch.randn(H, device="cuda"),
]

def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(symbolic_sizes=[-1, -1], contiguous=[True, True], dtype=DataType.Float, is_cpu=False)
    T1 = fd.define_tensor(symbolic_sizes=[-1], contiguous=[True], dtype=DataType.Float, is_cpu=False)
    T2 = fd.ops.broadcast_in_dim(T1, output_shape=[768, 768], broadcast_dims=[1])
    T3 = fd.ops.add(T0, T2)
    S4 = fd.define_constant(0.500000, dtype=DataType.Double)
    T5 = fd.ops.mul(T3, S4)
    S6 = fd.define_constant(0.797885, dtype=DataType.Double)
    T7 = fd.ops.mul(T3, S6)
    S8 = fd.define_constant(0.0447150, dtype=DataType.Double)
    T9 = fd.ops.mul(T3, S8)
    T10 = fd.ops.mul(T9, T3)
    S11 = fd.define_constant(1.00000, dtype=DataType.Double)
    T12 = fd.ops.add(T10, S11)
    T13 = fd.ops.mul(T7, T12)
    T14 = fd.ops.tanh(T13)
    S15 = fd.define_constant(1.00000, dtype=DataType.Double)
    T16 = fd.ops.add(T14, S15)
    T17 = fd.ops.mul(T5, T16)
    T18 = fd.ops.cast(T17, dtype=DataType.Float)
    fd.add_output(T18)

print("ASSERT 1")
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
with FusionDefinition() as fd:
    nvfuser_fusion_id0(fd)
print("ASSERT 2")
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
out = fd.execute(inputs)
print("ASSERT 3")
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"

from fuser.

kevinstephano avatar kevinstephano commented on August 16, 2024

This problem seems very specific to Python 3.10 and how the internal function _get_locale_encoding() is implemented. If I force the code down the emulated path, I don't see this problem. If it executes the C function that I can't find the implementation for, it fails. Note the code for python 3.9 and 3.11 is different!

https://github.com/python/cpython/blob/3.10/Lib/locale.py#L636-L647

    def _get_locale_encoding():
        if hasattr(sys, 'getandroidapilevel'):
            # On Android langinfo.h and CODESET are missing, and UTF-8 is
            # always used in mbstowcs() and wcstombs().
            return 'UTF-8'
        if sys.flags.utf8_mode:
            return 'UTF-8'
        encoding = getdefaultlocale()[1]
        if encoding is None:
            # LANG not set, default conservatively to ASCII
            encoding = 'ascii'
        return encoding

I think the C code is hitting this case but I am not sure why.

      if encoding is None:
            # LANG not set, default conservatively to ASCII
            encoding = 'ascii'

from fuser.

kevinstephano avatar kevinstephano commented on August 16, 2024

This is the test I used where I print the possible places to get the locale settings. Only getpreferredencoding() is in python 3.10 is showing something different.

import locale
import torch
from nvfuser import FusionDefinition, DataType
import os

H = 768
inputs = [
    torch.randn(H, H, device="cuda"),
    torch.randn(H, device="cuda"),
]

def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
    T0 = fd.define_tensor(symbolic_sizes=[-1, -1], contiguous=[True, True], dtype=DataType.Float, is_cpu=False)
    T1 = fd.define_tensor(symbolic_sizes=[-1], contiguous=[True], dtype=DataType.Float, is_cpu=False)
    T2 = fd.ops.broadcast_in_dim(T1, output_shape=[768, 768], broadcast_dims=[1])
    T3 = fd.ops.add(T0, T2)
    S4 = fd.define_constant(0.500000, dtype=DataType.Double)
    T5 = fd.ops.mul(T3, S4)
    S6 = fd.define_constant(0.797885, dtype=DataType.Double)
    T7 = fd.ops.mul(T3, S6)
    S8 = fd.define_constant(0.0447150, dtype=DataType.Double)
    T9 = fd.ops.mul(T3, S8)
    T10 = fd.ops.mul(T9, T3)
    S11 = fd.define_constant(1.00000, dtype=DataType.Double)
    T12 = fd.ops.add(T10, S11)
    T13 = fd.ops.mul(T7, T12)
    T14 = fd.ops.tanh(T13)
    S15 = fd.define_constant(1.00000, dtype=DataType.Double)
    T16 = fd.ops.add(T14, S15)
    T17 = fd.ops.mul(T5, T16)
    T18 = fd.ops.cast(T17, dtype=DataType.Float)
    fd.add_output(T18)

print("ASSERT 1", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
with FusionDefinition() as fd:
    nvfuser_fusion_id0(fd)
print("ASSERT 2", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
out = fd.execute(inputs)
print("ASSERT 3", locale.getpreferredencoding(do_setlocale=True), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"

Output:

print("ASSERT 1", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
with FusionDefinition() as fd:
    nvfuser_fusion_id0(fd)
print("ASSERT 2", locale.getpreferredencoding(), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"
out = fd.execute(inputs)
print("ASSERT 3", locale.getpreferredencoding(do_setlocale=True), locale.setlocale(locale.LC_CTYPE), locale.getlocale(), os.getenv('LANG'), os.getenv('PYTHONIOENCODING'), locale.getdefaultlocale())
assert locale.getpreferredencoding() == "UTF-8", f"Preferred encoding: {locale.getpreferredencoding()}"

from fuser.

kevinstephano avatar kevinstephano commented on August 16, 2024

I think this is Python 3.10 problem and not an nvFuser problem, so I am closing!

from fuser.

ksivaman avatar ksivaman commented on August 16, 2024

I am observing this issue with Python 3.8 as well @kevinstephano

from fuser.

kevinstephano avatar kevinstephano commented on August 16, 2024

I am going to close this issue, again, as there isn't anything we can do besides monitor the NVRTC bug.

from fuser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.