Git Product home page Git Product logo

Comments (28)

rrweller avatar rrweller commented on July 30, 2024 1

Strange I just had this same issue but with the default preset as well. 3060 12GB, up to date Auto1111 installation and building the TensorRT for SDXL

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024 1

I'll keep testing different things, maybe try a reboot, close out of some programs, etc. I think I've tried pretty much every preset option, and only "default" works consistently. I got the 768-1024 dynamic preset to work only once.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024 1

I tried setting the CUDA_MODULE_LOADING environment variable to LAZY, to see if it helped with memory. It did reduce vram memory usage before freezing to about 1.2GB or so, but when it hits 4% timing graph nodes the memory still jumps to 11.3GB and hangs.

How did you go about setting it to lazy?

I added SET CUDA_MODULE_LOADING=LAZY to my webui-user.bat file. There is probably a better way of doing it...

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024 1

I made sure xformers was installed and enabled, and set in the optimizations settings, and mine still freezes/hangs at 4% timing graph nodes (step 19/472) when set to any other preset besides "default." The GPU looks like it is still working in the task manager, but it never progresses beyond this point. I am on a 3060 12GB.

from stable-diffusion-webui-tensorrt.

alexbofa avatar alexbofa commented on July 30, 2024 1

The same 1024x1024. Even if you change from the preset 512x512 to 1024x1024. Freezes on 4%.
Other permissions work, tried 512Ρ…512, 1088Ρ…1920, 576Ρ…960

It's the same with these arguments.
SET CUDA_MODULE_LOADING=LAZY
--xformers

3060 12GB, NVIDIA Studio Driver 537.58

But the day before yesterday I created on 1024x1024

from stable-diffusion-webui-tensorrt.

C0C0Barbet avatar C0C0Barbet commented on July 30, 2024 1

I was just able to get past this issue. I launch the UI with webui-user.bat and over time my file ultimately ended up looking like this:

`@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--listen --opt-sdp-attention

set CUDA_VISIBLE_DEVICES=0
set CUDA_MODULE_LOADING=LAZY
set CUDA_LAUNCH_BLOCKING=1
set POLYGRAPHY_AUTOINSTALL_DEPS=1

call webui.bat`

I've been troubleshooting for a good deal of time with no luck until I decided to edit some of the lines I added. I don't recall why I needed to add the set CUDA_LAUNCH_BLOCKING setter now but it seems to have been the issue. My webui-user.bat now looks like this:

`@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--listen --xformers --opt-sdp-attention

set CUDA_VISIBLE_DEVICES=0
set CUDA_MODULE_LOADING=LAZY
set POLYGRAPHY_AUTOINSTALL_DEPS=1

call webui.bat`

I was also able to run it successfully with the following and, in all honestly, it seems like it was quicker.

`@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--listen --xformers

set CUDA_VISIBLE_DEVICES=0
set CUDA_MODULE_LOADING=LAZY

call webui.bat`

I hope this helps some of you.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024 1

I tried exporting the default 1024x1024 static engine again today after upgrading to the latest NVIDIA drivers (546.01), and setting the "CUDA- Sysmem fallback policy" for Python.exe to "prefer no sysmem fallback" as detailed here. The export ran into multiple errors during export, but it didn't get stuck at 4%! The errors looked like this:

1: [defaultAllocator.cpp::nvinfer1::internal::DefaultAllocator::allocate::20] Error Code 1: Cuda Runtime (out of memory)
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 34360788477-byte

But it did finish the export successfully, and I am able to run the 1024 engine now. So I think it was a VRAM memory issue, where during export it was exceeding the memory capacity of the 3060 12GB card, and perhaps trying to fallback to system memory, and that was causing it to freeze. After updating the NVIDIA drivers, and not allowing sysmem fallback, the export now completes. Not sure why it still gives many CUDA out of memory errors, but it seems to be able to recover from them.

Since I started this thread and it now seems resolved (at least the freezing part), I will close it.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

I tried some other presets, and it is freezing on those too, and on one of them it gave me this error:

Building engine:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                 | 3/6 [00:00<00:00,  6.99it/s][E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)ts:   0%|                                                                   | 0/5 [00:00<?, ?it/s]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)4%|β–ˆβ–ˆβ–                                                          | 19/472 [00:23<09:30,  1.26s/it]
[W] Requested amount of GPU memory (11046748160 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[W] UNSUPPORTED_STATESkipping tactic 0 due to insufficient memory on requested size of 11046748160 detected for tactic 0x0000000000000000.
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 188744702-byte buffer
[W] Unable to determine GPU memory usage: an illegal memory access was encountered
[E] 1: [defaultAllocator.cpp::nvinfer1::internal::DefaultAllocator::allocate::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[W] Requested amount of GPU memory (188744702 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 188744702-byte buffer
Building engine:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                 | 3/6 [01:22<01:22, 27.65s/it][E] 10: Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_10454 + ONNXTRT_Broadcast_808.../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.
Building engine: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [01:22<00:00, 13.83s/it]
[E] 1: [cudaResources.cpp::nvinfer1::ScopedCudaStream::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 10: [optimizer.cpp::nvinfer1::builder::cgraph::LeafCNode::computeCosts::4040] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_10454 + ONNXTRT_Broadcast_808.../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.)
[!] Invalid Engine. Please ensure the engine was built correctly
ERROR:root:Failed to build engine: Invalid Engine. Please ensure the engine was built correctly
Traceback (most recent call last):
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1117, in call_function
    prediction = await utils.async_iteration(iterator)
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 350, in async_iteration
    return await iterator.__anext__()
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 343, in __anext__
    return await anyio.to_thread.run_sync(
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 326, in run_sync_iterator_async
    return next(iterator)
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 695, in gen_wrapper
    yield from f(*args, **kwargs)
  File "D:\repos\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 175, in export_unet_to_trt
    ret = export_trt(
  File "D:\repos\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 158, in export_trt
    shared.sd_model = model.cuda()
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 73, in cuda
    return super().cuda(device=device)
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "d:\repos\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

It seems that every preset other than "default" (one of the other static or dynamic ones) cause it to freeze or have an illegal memory access. I've tried three of the other presets now with this result.

I tried exporting another engine with preset "default" and it created the engine fine.

from stable-diffusion-webui-tensorrt.

contentis avatar contentis commented on July 30, 2024

@Jonseed, what GPU are you using? This seems like an out-of-memory error.

from stable-diffusion-webui-tensorrt.

rrweller avatar rrweller commented on July 30, 2024

@Jonseed, what GPU are you using? This seems like an out-of-memory error.

I watched my memory usage while running the default preset, it never exceeded 3.5GB. Here is my cmd prompt output

Loading weights [31e35c80fc] from F:\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors
Creating model from config: F:\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
Applying attention optimization: Doggettx... done.
Model loaded in 14.8s (load weights from disk: 1.5s, create model: 0.4s, apply weights to model: 12.4s, move model to device: 0.1s, calculate empty prompt: 0.2s).
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 66.4s (prepare environment: 28.6s, import torch: 7.2s, import gradio: 0.7s, setup paths: 0.8s, initialize shared: 0.1s, other imports: 0.6s, setup codeformer: 0.2s, load scripts: 12.8s, create ui: 15.1s, gradio launch: 0.3s).
{'sample': [(1, 4, 96, 96), (2, 4, 128, 128), (8, 4, 128, 128)], 'timesteps': [(1,), (2,), (8,)], 'encoder_hidden_states': [(1, 77, 2048), (2, 77, 2048), (8, 154, 2048)], 'y': [(1, 2816), (2, 2816), (8, 2816)]}
Building TensorRT engine for F:\stable-diffusion-webui\models\Unet-onnx\sd_xl_base_1.0_be9edd61.onnx: F:\stable-diffusion-webui\models\Unet-trt\sd_xl_base_1.0_be9edd61_cc86_sample=1x4x96x96+2x4x128x128+8x4x128x128-timesteps=1+2+8-encoder_hidden_states=1x77x2048+2x77x2048+8x154x2048-y=1x2816+2x2816+8x2816.trt
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] Loading tactic timing cache from F:\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\timing_caches\timing_cache_win_cc86.cache
[I] Building engine with configuration:
    Flags                  | [FP16, REFIT, TF32]
    Engine Capability      | EngineCapability.DEFAULT
    Memory Pools           | [WORKSPACE: 12287.50 MiB, TACTIC_DRAM: 12287.50 MiB]
    Tactic Sources         | [CUBLAS, CUDNN, EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.LAYER_NAMES_ONLY
    Preview Features       | [FASTER_DYNAMIC_SHAPES_0805, DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805]
Building engine:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                 | 3/6 [00:01<00:00, 14.38it/s][E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)ts:   0%|                                                                   | 0/5 [00:00<?, ?it/s]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                   | 41/254 [00:39<04:02,  1.14s/it]
[W] Requested amount of GPU memory (12006195200 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[W] UNSUPPORTED_STATESkipping tactic 0 due to insufficient memory on requested size of 12006195200 detected for tactic 0x0000000000000000.
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 167783933-byte buffer
[W] Unable to determine GPU memory usage: an illegal memory access was encountered
[E] 1: [defaultAllocator.cpp::nvinfer1::internal::DefaultAllocator::allocate::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[W] Requested amount of GPU memory (167783933 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 167783933-byte buffer
Building engine:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                 | 3/6 [01:13<00:00, 14.38it/s][E] 10: Could not find any implementation for node {ForeignNode[/input_blocks.4/input_blocks.4.1/norm/Constant_1_output_0 + ONNXTRT_unsqueezeTensor_1789.../input_blocks.5/input_blocks.5.0/in_layers/in_layers.1/Mul]}.
Building engine: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6/6 [01:13<00:00, 12.31s/it]
[E] 1: [cudaResources.cpp::nvinfer1::ScopedCudaStream::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 10: [optimizer.cpp::nvinfer1::builder::cgraph::LeafCNode::computeCosts::4040] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/input_blocks.4/input_blocks.4.1/norm/Constant_1_output_0 + ONNXTRT_unsqueezeTensor_1789.../input_blocks.5/input_blocks.5.0/in_layers/in_layers.1/Mul]}.)
[!] Invalid Engine. Please ensure the engine was built correctly
ERROR:root:Failed to build engine: Invalid Engine. Please ensure the engine was built correctly
Traceback (most recent call last):
  File "F:\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "F:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "F:\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1117, in call_function
    prediction = await utils.async_iteration(iterator)
  File "F:\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 350, in async_iteration
    return await iterator.__anext__()
  File "F:\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 343, in __anext__
    return await anyio.to_thread.run_sync(
  File "F:\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "F:\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "F:\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "F:\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 326, in run_sync_iterator_async
    return next(iterator)
  File "F:\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 695, in gen_wrapper
    yield from f(*args, **kwargs)
  File "F:\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 175, in export_unet_to_trt    ret = export_trt(
  File "F:\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 158, in export_trt
    shared.sd_model = model.cuda()
  File "F:\stable-diffusion-webui\venv\lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 73, in cuda
    return super().cuda(device=device)
  File "F:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "F:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "F:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "F:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "F:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "F:\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

@Jonseed, what GPU are you using? This seems like an out-of-memory error.

I am on a 3060 12GB GPU, like @rrweller. I was able to get one dynamic preset to finish exporting, but every other time it either freezes at 4% or has an illegal memory access. I've had the most success with the "default." I don't think it has ever frozen or had an illegal memory access for me. These are all exporting for SD1.5 models. I haven't tried SDXL yet since it is not yet supported in the main release of Auto1111.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

The odd thing is that when it "freezes" the GPU seems to still be working according to the task manager, and will continue working indefinitely it seems. But the percentage progress on "timing graph nodes" doesn't change.

from stable-diffusion-webui-tensorrt.

contentis avatar contentis commented on July 30, 2024

When generating the engines, I'd recommend keeping your system as idle as possible. I don't have a better idea at this point.

What profiles have you tried that failed?

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

I have 128GB of system ram, so I know that isn't an issue...

from stable-diffusion-webui-tensorrt.

contentis avatar contentis commented on July 30, 2024

Huh, 768-1024 dynamic is probably the most vram consuming preset πŸ˜„ RAM should never be a limiting factor

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

It's possible that I just need to let it keep running, that it just appears frozen, but it will eventually come out of it. I'll try that too.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

@rrweller when mine appears to freeze, the vram (dedicated GPU memory) in the task manager is sitting at about 11.4 GB usage, and the utilization meter jumps around between 0-100%. So it looks like it is working...

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

So here is something interesting. When I have the Stable Diffusion tab open and active on my screen, the GPU utilization jumps to 90-100%. If I go to another tab or program, then the utilization drops to between 0-20%. I wonder if the progress meter (the glowing orange bar) on the Stable Diffusion page is interfering with the export. I know the standard progress meter is often disabled in Stable Diffusion when generating images because it slows down generation...

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

Before freezing, GPU memory usage is about 2.3GB... then when it hits 4% timing graph nodes the memory usage jumps to 11.4GB, and it hangs.

from stable-diffusion-webui-tensorrt.

Jonseed avatar Jonseed commented on July 30, 2024

I tried setting the CUDA_MODULE_LOADING environment variable to LAZY, to see if it helped with memory. It did reduce vram memory usage before freezing to about 1.2GB or so, but when it hits 4% timing graph nodes the memory still jumps to 11.3GB and hangs.

from stable-diffusion-webui-tensorrt.

rrweller avatar rrweller commented on July 30, 2024

The only thing that worked for me was being on the auto1111 master branch and using the 1024x1024 preset for SDXL, everything else fails with that error. Trying to use the one preset it makes for inference doesn't work either, switching to the dev branch sort of fixes it but the inference speed stays the same as without the extension

from stable-diffusion-webui-tensorrt.

contentis avatar contentis commented on July 30, 2024

@rajeevsrao do you maybe have an idea what could be causing this?

from stable-diffusion-webui-tensorrt.

algarci104 avatar algarci104 commented on July 30, 2024

I tried setting the CUDA_MODULE_LOADING environment variable to LAZY, to see if it helped with memory. It did reduce vram memory usage before freezing to about 1.2GB or so, but when it hits 4% timing graph nodes the memory still jumps to 11.3GB and hangs.

How did you go about setting it to lazy?

from stable-diffusion-webui-tensorrt.

ColinCee avatar ColinCee commented on July 30, 2024

Just want to say I had the exact same issue, once I enabled xformers to reduce vram usage I was able to build an engine with max size of 1536x1536

As well as using the --xformers arg, make sure to also select xformers in the optimizations tab in settings

FYI I'm using a 4090

from stable-diffusion-webui-tensorrt.

Codyport avatar Codyport commented on July 30, 2024

Hi all.. I have a minor variation of this problem insofar as my SD instance completely quits when it hits the magic 4% mark. Building any TensorRT export over an uncertain size. With v1-5-pruned-emaonly.safetensors, I am able to export 512x512, 768x768 and 512x768 but 1024x768 and anything larger kills SD. I've tried this on SDXL and other checkpoints to the same exact result. I'm running on Kubuntu 23.04 with 64Gb RAM and a 4070/12Gb card.

I'm running with --listen --enable-insecure-extension-access --xformers

When it flips out it says:
stable-diffusion-webui/webui.sh: line 255: 9425 Aborted (core dumped) "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

from stable-diffusion-webui-tensorrt.

Codyport avatar Codyport commented on July 30, 2024

Not sure we have that option on Linux, I'll probably live with it I guess! But yes it 'smells' suspiciously like an out of memory condition.

from stable-diffusion-webui-tensorrt.

J-Cott avatar J-Cott commented on July 30, 2024

I've been troubleshooting for a good deal of time with no luck until I decided to edit some of the lines I added. I don't recall why I needed to add the set CUDA_LAUNCH_BLOCKING setter now but it seems to have been the issue. My webui-user.bat now looks like this:

Thank You! I have been trying to fix this for weeks now. Nor can I remember why I added that (CUDA_LAUNCH_BLOCKING) Parameter but I have taken it out and building the Unet has now stopped getting stuck at 2%.

from stable-diffusion-webui-tensorrt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.