Comments (8)
I am trying to create a minimum example, there are some code I cannot share right now.
from pytorch.
FYI, the code also works with CUDAGraph, so I suppose that the triton version has some conflicts.
from pytorch.
@Luke20000429 your torch version is 2.2.2. User defined triton kernels are officially released on 2.3. Could you upgrade your pytorch version and try again?
from pytorch.
I've upgraded my torch and flash-attn and triton. Without customized triton operator it works fine.
Now it looks like I can run torch.compile
on model with my custom triton operator, but the results are incorrect. My code is like
model = AutoModelForCausalLM.from_pretrained("luodian/llama-7b-hf", torch_dtype=torch.float16)
model = model.to(device).eval()
# replace mlp by my triton operator
# compiled model returns correct output without this
# ['<s> My favourite condiment is ketchup. I love it on everything. I love it on my eggs, on my burg']
for layer in model.model.layers:
layer.mlp = Triton_myMLP(layer.mlp)
tokenizer = AutoTokenizer.from_pretrained("luodian/llama-7b-hf")
prompt = "My favourite condiment is"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
model.generation_config.max_new_tokens = 20
out = model.generate(input_ids, cache_implementation="static") # warmup ?
print(tokenizer.batch_decode(out.long()))
out = model.generate(input_ids, cache_implementation="static")
print(tokenizer.batch_decode(out.long())) # the result is like ['<s> My favourite condiment is plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus']
This looks like the static cache didn't step forward.
I suppose compile
doesn't do warmup and Triton JIT might ran multiple times for autotune. So, I warmup the operator manually by
for i in range(2):
text = generate(prompt, model, tokenizer, max_length=20)
print(text) # this gives the correct results
and compile. It returns me another error
/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py:124: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
Traceback (most recent call last):
File "/home/user/workarea/simple_static.py", line 100, in <module>
out = model.generate(input_ids, cache_implementation="static") # warmup ?
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/transformers/generation/utils.py", line 1736, in generate
result = self._sample(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2375, in _sample
outputs = self(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
return _compile(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
out_code = transform_code_object(code, transform)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
transformations(instructions, code_options)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
return fn(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 500, in transform
tracer.run()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
super().run()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE
self.output.compile_subgraph(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1001, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1178, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1251, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1232, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/__init__.py", line 1731, in __call__
return compile_fx(model_, inputs_, config_patches=self.config)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 1102, in compile_fx
return compile_fx(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx
return aot_autograd(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base
return inner_compile(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn
return self.compile_to_module().call
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/graph.py", line 1254, in compile_to_module
mod = PyCodeCache.load_by_key_path(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 2160, in load_by_key_path
exec(code, mod.__dict__, mod.__dict__)
File "/home/xueshen/tmp/torchinductor_xueshen/l2/cl2u6wpoycbpdp4aztrroprkgqtcvo5jswnhcemznnqhhe3td5yq.py", line 39, in <module>
triton_red_fused__to_copy_add_embedding_mean_mul_pow_rsqrt_0 = async_compile.triton('triton_', '''
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 2658, in triton
future = self.process_pool().submit(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/concurrent/futures/process.py", line 707, in submit
raise BrokenProcessPool(self._broken)
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Therefore, I finally tried giving only one configuration for my triton operator, manually warmup, and compile and run the compiled model. It still gives me the wrong output as
['<s> My favourite condiment is plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus']
from pytorch.
I will take a look at it tomorrow, please share the exact code you're running
from pytorch.
Here is a simple triton operator
import triton
from triton import language as tl
import torch
# `triton.jit`'ed functions can be auto-tuned by using the `triton.autotune` decorator, which consumes:
# - A list of `triton.Config` objects that define different configurations of
# meta-parameters (e.g., `BLOCK_SIZE_M`) and compilation options (e.g., `num_warps`) to try
# - An auto-tuning *key* whose change in values will trigger evaluation of all the
# provided configs
@triton.autotune(
# configs=get_cuda_autotune_config(),
configs= [
triton.Config({'BLOCK_SIZE_M': 16, 'BLOCK_SIZE_N': 64, 'BLOCK_SIZE_K': 64, 'GROUP_SIZE_M': 1}, num_stages=3,
num_warps=16),
triton.Config({'BLOCK_SIZE_M': 16, 'BLOCK_SIZE_N': 128, 'BLOCK_SIZE_K': 64, 'GROUP_SIZE_M': 1}, num_stages=3,
num_warps=16),
],
key=['M', 'N', 'K'],
)
@triton.jit
def matmul_kernel(
# Pointers to matrices
a_ptr, b_ptr, c_ptr,
# Matrix dimensions
M, N, K,
# The stride variables represent how much to increase the ptr by when moving by 1
# element in a particular dimension. E.g. `stride_am` is how much to increase `a_ptr`
# by to get the element one row down (A has M rows).
stride_am, stride_ak, #
stride_bk, stride_bn, #
stride_cm, stride_cn,
# Meta-parameters
BLOCK_SIZE_M: tl.constexpr, BLOCK_SIZE_N: tl.constexpr, BLOCK_SIZE_K: tl.constexpr, #
GROUP_SIZE_M: tl.constexpr, #
):
"""Kernel for computing the matmul C = A x B.
A has shape (M, K), B has shape (K, N) and C has shape (M, N)
"""
# -----------------------------------------------------------
# Map program ids `pid` to the block of C it should compute.
# This is done in a grouped ordering to promote L2 data reuse.
# See above `L2 Cache Optimizations` section for details.
pid = tl.program_id(axis=0)
num_pid_m = tl.cdiv(M, BLOCK_SIZE_M)
num_pid_n = tl.cdiv(N, BLOCK_SIZE_N)
num_pid_in_group = GROUP_SIZE_M * num_pid_n
group_id = pid // num_pid_in_group
first_pid_m = group_id * GROUP_SIZE_M
group_size_m = min(num_pid_m - first_pid_m, GROUP_SIZE_M)
pid_m = first_pid_m + (pid % group_size_m)
pid_n = (pid % num_pid_in_group) // group_size_m
# ----------------------------------------------------------
# Create pointers for the first blocks of A and B.
# We will advance this pointer as we move in the K direction
# and accumulate
# `a_ptrs` is a block of [BLOCK_SIZE_M, BLOCK_SIZE_K] pointers
# `b_ptrs` is a block of [BLOCK_SIZE_K, BLOCK_SIZE_N] pointers
# See above `Pointer Arithmetic` section for details
offs_am = (pid_m * BLOCK_SIZE_M + tl.arange(0, BLOCK_SIZE_M)) % M
offs_bn = (pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N)) % N
offs_k = tl.arange(0, BLOCK_SIZE_K)
a_ptrs = a_ptr + (offs_am[:, None] * stride_am + offs_k[None, :] * stride_ak)
b_ptrs = b_ptr + (offs_k[:, None] * stride_bk + offs_bn[None, :] * stride_bn)
# -----------------------------------------------------------
# Iterate to compute a block of the C matrix.
# We accumulate into a `[BLOCK_SIZE_M, BLOCK_SIZE_N]` block
# of fp32 values for higher accuracy.
# `accumulator` will be converted back to fp16 after the loop.
accumulator = tl.zeros((BLOCK_SIZE_M, BLOCK_SIZE_N), dtype=tl.float32)
for k in range(0, tl.cdiv(K, BLOCK_SIZE_K)):
# Load the next block of A and B, generate a mask by checking the K dimension.
# If it is out of bounds, set it to 0.
a = tl.load(a_ptrs, mask=offs_k[None, :] < K - k * BLOCK_SIZE_K, other=0.0)
b = tl.load(b_ptrs, mask=offs_k[:, None] < K - k * BLOCK_SIZE_K, other=0.0)
# We accumulate along the K dimension.
accumulator = tl.dot(a, b, accumulator)
# Advance the ptrs to the next K block.
a_ptrs += BLOCK_SIZE_K * stride_ak
b_ptrs += BLOCK_SIZE_K * stride_bk
# You can fuse arbitrary activation functions here
# while the accumulator is still in FP32!
c = accumulator.to(tl.float16)
# -----------------------------------------------------------
# Write back the block of the output matrix C with masks.
offs_cm = pid_m * BLOCK_SIZE_M + tl.arange(0, BLOCK_SIZE_M)
offs_cn = pid_n * BLOCK_SIZE_N + tl.arange(0, BLOCK_SIZE_N)
c_ptrs = c_ptr + stride_cm * offs_cm[:, None] + stride_cn * offs_cn[None, :]
c_mask = (offs_cm[:, None] < M) & (offs_cn[None, :] < N)
tl.store(c_ptrs, c, mask=c_mask)
def matmul(a, b, activation=""):
# Check constraints.
assert a.shape[1] == b.shape[0], "Incompatible dimensions"
assert a.is_contiguous(), "Matrix A must be contiguous"
M, K = a.shape
K, N = b.shape
# Allocates output.
c = torch.zeros((M, N), device=a.device, dtype=a.dtype)
# 1D launch kernel where each block gets its own program.
grid = lambda META: (triton.cdiv(M, META['BLOCK_SIZE_M']) * triton.cdiv(N, META['BLOCK_SIZE_N']), )
matmul_kernel[grid](
a, b, c, #
M, N, K, #
a.stride(0), a.stride(1), #
b.stride(0), b.stride(1), #
c.stride(0), c.stride(1), #
)
return c
And here is the inference code
from transformers import AutoModelForCausalLM, AutoTokenizer, StaticCache
import torch
from typing import Optional
from kernels.basic_gemm import matmul # change to your path
from torch import nn
device = "cuda"
class Triton_myMLP(nn.Module):
def __init__(self, llama_mlp_layer):
super().__init__()
W1 = llama_mlp_layer.gate_proj.weight.cuda()
self.W1T = torch.empty_like(W1.t()).copy_(W1.t()).contiguous().cuda()
W2 = llama_mlp_layer.up_proj.weight.cuda()
self.W2T = torch.empty_like(W2.t()).copy_(W2.t()).contiguous().cuda()
W3 = llama_mlp_layer.down_proj.weight.cuda()
self.W3T = torch.empty_like(W3.t()).copy_(W3.t()).contiguous().cuda()
def forward(self, x):
x = x.view(-1, 4096)
gate = torch.nn.functional.silu(x @ self.W1T)
up = matmul(x, self.W2T)
c = gate*up
output = c @ self.W3T
return output[None, :, :]
model = AutoModelForCausalLM.from_pretrained("luodian/llama-7b-hf", torch_dtype=torch.float16)
model = model.to(device).eval()
# replace mlp
for layer in model.model.layers:
layer.mlp = Triton_myMLP(layer.mlp)
tokenizer = AutoTokenizer.from_pretrained("luodian/llama-7b-hf")
prompt = "My favourite condiment is"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
model.generation_config.max_new_tokens = 20
out = model.generate(input_ids) # warmup ?
print(tokenizer.batch_decode(out.long())) # output is correct
model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
out = model.generate(input_ids, cache_implementation="static")
print(tokenizer.batch_decode(out.long()))
I try to reproduce my error by creating a simplified version but this simple triton operator gives me some new errors.
The error is also mentioned by #126864
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: cannot extract sympy expressions from {'c_ptr': FakeTensor(..., device='cuda:0', size=(7, 11008), dtype=torch.float16)} <class 'torch.fx.immutable_collections.immutable_dict'>
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
from pytorch.
Sometimes it also returns error
/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py:124: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
warnings.warn(
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
LLVM ERROR: pthread_join failed: Invalid argument
Traceback (most recent call last):
File "/home/user/workarea/simple_static.py", line 70, in <module>
out = model.generate(input_ids, cache_implementation="static")
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/transformers/generation/utils.py", line 1736, in generate
result = self._sample(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2375, in _sample
outputs = self(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
return _compile(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
out_code = transform_code_object(code, transform)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
transformations(instructions, code_options)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
return fn(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 500, in transform
tracer.run()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
super().run()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2268, in RETURN_VALUE
self.output.compile_subgraph(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1001, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1178, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1251, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1232, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/__init__.py", line 1731, in __call__
return compile_fx(model_, inputs_, config_patches=self.config)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 1102, in compile_fx
return compile_fx(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 1330, in compile_fx
return aot_autograd(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/backends/common.py", line 58, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/aot_autograd.py", line 903, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/aot_autograd.py", line 628, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 443, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 648, in aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 119, in aot_dispatch_base
compiled_fw = compiler(fw_module, updated_flat_args)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 1257, in fw_compiler_base
return inner_compile(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 438, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/compile_fx.py", line 714, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/graph.py", line 1307, in compile_to_fn
return self.compile_to_module().call
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/graph.py", line 1254, in compile_to_module
mod = PyCodeCache.load_by_key_path(
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 2160, in load_by_key_path
exec(code, mod.__dict__, mod.__dict__)
File "/home/xueshen/tmp/torchinductor_xueshen/ey/ceyqnfqif4gc2yy5lfv3qtmpyxru5dgpfigrveo5bqepci2b473c.py", line 1219, in <module>
async_compile.wait(globals())
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 2715, in wait
scope[key] = result.result()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/site-packages/torch/_inductor/codecache.py", line 2522, in result
self.future.result()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/concurrent/futures/_base.py", line 446, in result
return self.__get_result()
File "/home/user/anaconda3/envs/myenv/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
from pytorch.
Can you run with TORCHINDUCTOR_COMPILE_THREAD=1
it will give you better error messages, instead of process pool nonsense
from pytorch.
Related Issues (20)
- Trying to use forward AD with _scaled_dot_product_flash_attention that does not support it because it has not been implemented yet. HOT 1
- [RFC][Pipelining] Support separate dW/dInput in Schedule and Stage HOT 1
- Look up tensor device member inside Tensor is_pinned() implementation instead of accepting an outside input HOT 3
- Spurious "socket cannot be initialized" error messages HOT 9
- Zluda Support HOT 12
- Expand Tag Set: views & reductions HOT 2
- Extract some public APIs from torch::cuda::initModule(module) to torch::initModule() HOT 3
- nn.Linear outputs differ on the same input tensor HOT 4
- torch parallel Broadcast inconsistency HOT 2
- [PT2E Quantization] Graph with concatenation of the same node will raise RecursionError when prepare_pt2e HOT 6
- [xpu] ERROR: Failed building wheel for triton when USE_XPU=1 make triton HOT 2
- Add comment for label_smoothing parameter in torch.nn.CrossEntropyLoss
- return type of torch.nn.functional.interpolate not working HOT 1
- Incompatability between torch>=2.3 and torchdatasets==0.2.0 HOT 2
- UNSTABLE periodic / win-vs2019-cuda11.8-py3 / test (default) HOT 2
- UNSTABLE periodic / win-vs2019-cuda11.8-py3 / test (default) HOT 1
- frombuffer() → "The given buffer is not writable" warning, tensor has some NaNs HOT 2
- Regression in loading optimizer learning rate HOT 5
- UNSTABLE pull / linux-focal-cuda12.1-py3.10-gcc9-experimental-split-build / test (default) HOT 4
- Adding betainc
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch.