Thank you for providing this, it is really exciting to have a simpler way to run SD us

Hey, issue <a class="issue-link js-issue-link" data-error-text="Failed to load title"

I'm having the same issue. <div class="snippet-clipboard-content notranslate posit

My result from ./build dev is: <div class="snippe

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Error "RuntimeError: Unexpected error from cudaGetDeviceCount()" about stable-diffusion-docker HOT 13 CLOSED

fboulnois commented on May 31, 2024 1

Error "RuntimeError: Unexpected error from cudaGetDeviceCount()"

from stable-diffusion-docker.

Comments (13)

xrd commented on May 31, 2024 2

After a bit of review, perhaps this is my CUDA version mismatched? The requirements.txt has https://download.pytorch.org/whl/cu116, but I have CUDA version 11.4 listed from nvidia-smi. I'll try adjusting this.

After trying adjusting, it seems to fail. I'll investigate more later.

 sudo ./build.sh build
[sudo] password for xrdawson: 
Sending build context to Docker daemon  674.3kB
Step 1/10 : FROM tensorflow/tensorflow:2.10.0-gpu
 ---> c8d4e2940044
Step 2/10 : COPY requirements.txt /
 ---> Using cache
 ---> 3fd2a4dee917
Step 3/10 : RUN pip install -r requirements.txt   --extra-index-url https://download.pytorch.org/whl/cu114
 ---> Running in 61aa74af8679
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu114
Collecting diffusers==0.4.1
  Downloading diffusers-0.4.1-py3-none-any.whl (229 kB)
ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu116 (from -r requirements.txt (line 2)) (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1)
ERROR: No matching distribution found for torch==1.12.1+cu116 (from -r requirements.txt (line 2))
WARNING: You are using pip version 20.2.4; however, version 22.2.2 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
The command '/bin/bash -c pip install -r requirements.txt   --extra-index-url https://download.pytorch.org/whl/cu114' returned a non-zero code: 1

from stable-diffusion-docker.

xrd commented on May 31, 2024 2

Hey, issue #13 fixed it!

I added this to the end of the Dockerfile, but before the ENTRYPOINT:

USER root
RUN rm -rf /usr/local/cuda/lib64/stubs
USER huggingface # make to go back to correct user

Then ./build.sh build and ./bulld.sh dev and then:

 ./build.sh dev
$ python -c "import torch; print(torch.cuda.is_available())"
True

from stable-diffusion-docker.

xrd commented on May 31, 2024 1

I'm on nixos, and that version of nvidia-smi is what I could get working given my poor understanding of nix and nvidia drivers. :)

Unfortunately, that did not seem to fix things. I even rebooted to make sure the GPU wasn't in a weird state from other ML I'm playing with.

I'll look into upgrading some of the pieces of the system, usually nix has recipes for that, I just often find it hard to get them to work!

from stable-diffusion-docker.

fboulnois commented on May 31, 2024

Pytorch doesn't seem to directly support CUDA 11.4, but perhaps downgrading the versions will work? In the Dockerfile, switch --extra-index-url to https://download.pytorch.org/whl/cu113, and then adjust the version in requirements.txt from torch==1.12.1+cu116 to torch==1.12.1+cu113 and see if that fixes things.

The other thing I see which might be causing an issue is that your installed driver version is a bit old (latest stable is 515.76). Is that the latest driver available for your GPU?

from stable-diffusion-docker.

k1rb commented on May 31, 2024

I'm having the same issue.

$ nvidia-smi 
Tue Oct 11 23:50:12 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   51C    P8    26W / 260W |    266MiB / 11264MiB |     16%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8501      G   /usr/libexec/Xorg                 138MiB |
|    0   N/A  N/A      8735      G   /usr/bin/gnome-shell               62MiB |
|    0   N/A  N/A      9665      G   ...RendererForSitePerProcess       18MiB |
|    0   N/A  N/A     11977      G   /usr/lib64/firefox/firefox         23MiB |
|    0   N/A  N/A     28858      G   nvidia-settings                     0MiB |
+-----------------------------------------------------------------------------+

I've tried changing requirements.txt:

diffusers==0.4.1
torch==1.13.0.dev20220813+cu117
transformers==4.22.2

and Dockerfile:

RUN pip install -r requirements.txt \
  --extra-index-url https://download.pytorch.org/whl/nightly/cu117

And here's my error, almost exactly the same as OP's:

load pipeline start: 2022-10-12T06:49:11.902814
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
Fetching 16 files: 100%|██████████| 16/16 [00:00<00:00, 7390.03it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
Traceback (most recent call last):
  File "/usr/local/bin/docker-entrypoint.py", line 174, in <module>
    main()
  File "/usr/local/bin/docker-entrypoint.py", line 157, in main
    stable_diffusion(
  File "/usr/local/bin/docker-entrypoint.py", line 37, in stable_diffusion
    pipe = StableDiffusionPipeline.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py", line 179, in to
    module.to(torch_device)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 982, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 635, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 635, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 658, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 980, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 222, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library

from stable-diffusion-docker.

fboulnois commented on May 31, 2024

Are you also on NixOS? If you run ./build dev it will drop you into a terminal, and then you can try python -c "import torch; print(torch.cuda.is_available())" . Let me know what the results are.

from stable-diffusion-docker.

xrd commented on May 31, 2024

My result from ./build dev is:

$ python -c "import torch; print(torch.cuda.is_available())"
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False
$

Update:

I'm also playing with InvokeAI. I have a conda environment setup (not sure how to share that, but basically on nix I run conda-shell; conda activate invokeai and was using the recent repo here: https://github.com/invoke-ai/InvokeAI/)

If I use that conda environment (obviously this isn't inside docker) then I see this:

$ python -c "import torch; print(torch.cuda.is_available())"
True

Can I provide other information? At least this gives me hope that my host system is not totally broken.

from stable-diffusion-docker.

fboulnois commented on May 31, 2024

That's good information, thanks. What version of Docker are you on? Also, is this other closed issue useful in any way? #13

from stable-diffusion-docker.

xrd commented on May 31, 2024

$ sudo docker version
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.16.8
 Git commit:        v20.10.7
 Built:             Thu Jan  1 00:00:00 1970
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.8
  Git commit:       v20.10.7
  Built:            Tue Jan  1 00:00:00 1980
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.5.7
  GitCommit:        v1.5.7
 runc:
  Version:          1.0.0-rc95
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:

from stable-diffusion-docker.

fboulnois commented on May 31, 2024

Good to know. Since the issue has been reported a couple times I think it's worth making some changes either to the Dockerfile or add something to the README.

from stable-diffusion-docker.

xrd commented on May 31, 2024

Well, it is working, but got a false positive with the NSFW content. That's weird:

./build.sh run --W 256 --H 256 --half --attention-slicing  --prompt 'abstract art'
load pipeline start: 2022-10-12T21:06:19.903981
Fetching 16 files: 100%|██████████| 16/16 [00:00<00:00, 35582.64it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
loaded models after: 2022-10-12T21:06:36.461160
100%|██████████| 51/51 [00:13<00:00,  3.88it/s]
Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.
completed pipeline: 2022-10-12T21:06:50.063386

Next time worked, so I think you can close this issue. Thank you.

from stable-diffusion-docker.

fboulnois commented on May 31, 2024

The safety checker is pretty sensitive even for regular content. You can skip it with --skip.

from stable-diffusion-docker.

fboulnois commented on May 31, 2024

I added a fix which removes the stubs before any code can run, so hopefully this should fix the issue long term!

from stable-diffusion-docker.

Error "RuntimeError: Unexpected error from cudaGetDeviceCount()" about stable-diffusion-docker HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent