Git Product home page Git Product logo

Comments (13)

xrd avatar xrd commented on May 31, 2024 2

After a bit of review, perhaps this is my CUDA version mismatched? The requirements.txt has https://download.pytorch.org/whl/cu116, but I have CUDA version 11.4 listed from nvidia-smi. I'll try adjusting this.

After trying adjusting, it seems to fail. I'll investigate more later.

 sudo ./build.sh build
[sudo] password for xrdawson: 
Sending build context to Docker daemon  674.3kB
Step 1/10 : FROM tensorflow/tensorflow:2.10.0-gpu
 ---> c8d4e2940044
Step 2/10 : COPY requirements.txt /
 ---> Using cache
 ---> 3fd2a4dee917
Step 3/10 : RUN pip install -r requirements.txt   --extra-index-url https://download.pytorch.org/whl/cu114
 ---> Running in 61aa74af8679
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu114
Collecting diffusers==0.4.1
  Downloading diffusers-0.4.1-py3-none-any.whl (229 kB)
ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu116 (from -r requirements.txt (line 2)) (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1)
ERROR: No matching distribution found for torch==1.12.1+cu116 (from -r requirements.txt (line 2))
WARNING: You are using pip version 20.2.4; however, version 22.2.2 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
The command '/bin/bash -c pip install -r requirements.txt   --extra-index-url https://download.pytorch.org/whl/cu114' returned a non-zero code: 1


from stable-diffusion-docker.

xrd avatar xrd commented on May 31, 2024 2

Hey, issue #13 fixed it!

I added this to the end of the Dockerfile, but before the ENTRYPOINT:

USER root
RUN rm -rf /usr/local/cuda/lib64/stubs
USER huggingface # make to go back to correct user

Then ./build.sh build and ./bulld.sh dev and then:

 ./build.sh dev
$ python -c "import torch; print(torch.cuda.is_available())"
True

from stable-diffusion-docker.

xrd avatar xrd commented on May 31, 2024 1

I'm on nixos, and that version of nvidia-smi is what I could get working given my poor understanding of nix and nvidia drivers. :)

Unfortunately, that did not seem to fix things. I even rebooted to make sure the GPU wasn't in a weird state from other ML I'm playing with.

I'll look into upgrading some of the pieces of the system, usually nix has recipes for that, I just often find it hard to get them to work!

from stable-diffusion-docker.

fboulnois avatar fboulnois commented on May 31, 2024

Pytorch doesn't seem to directly support CUDA 11.4, but perhaps downgrading the versions will work? In the Dockerfile, switch --extra-index-url to https://download.pytorch.org/whl/cu113, and then adjust the version in requirements.txt from torch==1.12.1+cu116 to torch==1.12.1+cu113 and see if that fixes things.

The other thing I see which might be causing an issue is that your installed driver version is a bit old (latest stable is 515.76). Is that the latest driver available for your GPU?

from stable-diffusion-docker.

k1rb avatar k1rb commented on May 31, 2024

I'm having the same issue.

$ nvidia-smi 
Tue Oct 11 23:50:12 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:0A:00.0  On |                  N/A |
|  0%   51C    P8    26W / 260W |    266MiB / 11264MiB |     16%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8501      G   /usr/libexec/Xorg                 138MiB |
|    0   N/A  N/A      8735      G   /usr/bin/gnome-shell               62MiB |
|    0   N/A  N/A      9665      G   ...RendererForSitePerProcess       18MiB |
|    0   N/A  N/A     11977      G   /usr/lib64/firefox/firefox         23MiB |
|    0   N/A  N/A     28858      G   nvidia-settings                     0MiB |
+-----------------------------------------------------------------------------+

I've tried changing requirements.txt:

diffusers==0.4.1
torch==1.13.0.dev20220813+cu117
transformers==4.22.2

and Dockerfile:

RUN pip install -r requirements.txt \
  --extra-index-url https://download.pytorch.org/whl/nightly/cu117

And here's my error, almost exactly the same as OP's:

load pipeline start: 2022-10-12T06:49:11.902814
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
Fetching 16 files: 100%|██████████| 16/16 [00:00<00:00, 7390.03it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
Traceback (most recent call last):
  File "/usr/local/bin/docker-entrypoint.py", line 174, in <module>
    main()
  File "/usr/local/bin/docker-entrypoint.py", line 157, in main
    stable_diffusion(
  File "/usr/local/bin/docker-entrypoint.py", line 37, in stable_diffusion
    pipe = StableDiffusionPipeline.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py", line 179, in to
    module.to(torch_device)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 982, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 635, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 635, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 658, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 980, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 222, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library

from stable-diffusion-docker.

fboulnois avatar fboulnois commented on May 31, 2024

Are you also on NixOS? If you run ./build dev it will drop you into a terminal, and then you can try python -c "import torch; print(torch.cuda.is_available())" . Let me know what the results are.

from stable-diffusion-docker.

xrd avatar xrd commented on May 31, 2024

My result from ./build dev is:

$ python -c "import torch; print(torch.cuda.is_available())"
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False
$ 

Update:

I'm also playing with InvokeAI. I have a conda environment setup (not sure how to share that, but basically on nix I run conda-shell; conda activate invokeai and was using the recent repo here: https://github.com/invoke-ai/InvokeAI/)

If I use that conda environment (obviously this isn't inside docker) then I see this:

$ python -c "import torch; print(torch.cuda.is_available())"
True

Can I provide other information? At least this gives me hope that my host system is not totally broken.

from stable-diffusion-docker.

fboulnois avatar fboulnois commented on May 31, 2024

That's good information, thanks. What version of Docker are you on? Also, is this other closed issue useful in any way? #13

from stable-diffusion-docker.

xrd avatar xrd commented on May 31, 2024
$ sudo docker version
Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.16.8
 Git commit:        v20.10.7
 Built:             Thu Jan  1 00:00:00 1970
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.8
  Git commit:       v20.10.7
  Built:            Tue Jan  1 00:00:00 1980
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.5.7
  GitCommit:        v1.5.7
 runc:
  Version:          1.0.0-rc95
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:        

from stable-diffusion-docker.

fboulnois avatar fboulnois commented on May 31, 2024

Good to know. Since the issue has been reported a couple times I think it's worth making some changes either to the Dockerfile or add something to the README.

from stable-diffusion-docker.

xrd avatar xrd commented on May 31, 2024

Well, it is working, but got a false positive with the NSFW content. That's weird:

./build.sh run --W 256 --H 256 --half --attention-slicing  --prompt 'abstract art'
load pipeline start: 2022-10-12T21:06:19.903981
Fetching 16 files: 100%|██████████| 16/16 [00:00<00:00, 35582.64it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
loaded models after: 2022-10-12T21:06:36.461160
100%|██████████| 51/51 [00:13<00:00,  3.88it/s]
Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.
completed pipeline: 2022-10-12T21:06:50.063386

Next time worked, so I think you can close this issue. Thank you.

from stable-diffusion-docker.

fboulnois avatar fboulnois commented on May 31, 2024

The safety checker is pretty sensitive even for regular content. You can skip it with --skip.

from stable-diffusion-docker.

fboulnois avatar fboulnois commented on May 31, 2024

I added a fix which removes the stubs before any code can run, so hopefully this should fix the issue long term!

from stable-diffusion-docker.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.