Comments (13)
After a bit of review, perhaps this is my CUDA version mismatched? The requirements.txt has https://download.pytorch.org/whl/cu116
, but I have CUDA version 11.4 listed from nvidia-smi. I'll try adjusting this.
After trying adjusting, it seems to fail. I'll investigate more later.
sudo ./build.sh build
[sudo] password for xrdawson:
Sending build context to Docker daemon 674.3kB
Step 1/10 : FROM tensorflow/tensorflow:2.10.0-gpu
---> c8d4e2940044
Step 2/10 : COPY requirements.txt /
---> Using cache
---> 3fd2a4dee917
Step 3/10 : RUN pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu114
---> Running in 61aa74af8679
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu114
Collecting diffusers==0.4.1
Downloading diffusers-0.4.1-py3-none-any.whl (229 kB)
ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu116 (from -r requirements.txt (line 2)) (from versions: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1)
ERROR: No matching distribution found for torch==1.12.1+cu116 (from -r requirements.txt (line 2))
WARNING: You are using pip version 20.2.4; however, version 22.2.2 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
The command '/bin/bash -c pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu114' returned a non-zero code: 1
from stable-diffusion-docker.
Hey, issue #13 fixed it!
I added this to the end of the Dockerfile, but before the ENTRYPOINT:
USER root
RUN rm -rf /usr/local/cuda/lib64/stubs
USER huggingface # make to go back to correct user
Then ./build.sh build
and ./bulld.sh dev
and then:
./build.sh dev
$ python -c "import torch; print(torch.cuda.is_available())"
True
from stable-diffusion-docker.
I'm on nixos, and that version of nvidia-smi is what I could get working given my poor understanding of nix and nvidia drivers. :)
Unfortunately, that did not seem to fix things. I even rebooted to make sure the GPU wasn't in a weird state from other ML I'm playing with.
I'll look into upgrading some of the pieces of the system, usually nix has recipes for that, I just often find it hard to get them to work!
from stable-diffusion-docker.
Pytorch doesn't seem to directly support CUDA 11.4, but perhaps downgrading the versions will work? In the Dockerfile
, switch --extra-index-url
to https://download.pytorch.org/whl/cu113
, and then adjust the version in requirements.txt
from torch==1.12.1+cu116
to torch==1.12.1+cu113
and see if that fixes things.
The other thing I see which might be causing an issue is that your installed driver version is a bit old (latest stable is 515.76). Is that the latest driver available for your GPU?
from stable-diffusion-docker.
I'm having the same issue.
$ nvidia-smi
Tue Oct 11 23:50:12 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:0A:00.0 On | N/A |
| 0% 51C P8 26W / 260W | 266MiB / 11264MiB | 16% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 8501 G /usr/libexec/Xorg 138MiB |
| 0 N/A N/A 8735 G /usr/bin/gnome-shell 62MiB |
| 0 N/A N/A 9665 G ...RendererForSitePerProcess 18MiB |
| 0 N/A N/A 11977 G /usr/lib64/firefox/firefox 23MiB |
| 0 N/A N/A 28858 G nvidia-settings 0MiB |
+-----------------------------------------------------------------------------+
I've tried changing requirements.txt
:
diffusers==0.4.1
torch==1.13.0.dev20220813+cu117
transformers==4.22.2
and Dockerfile
:
RUN pip install -r requirements.txt \
--extra-index-url https://download.pytorch.org/whl/nightly/cu117
And here's my error, almost exactly the same as OP's:
load pipeline start: 2022-10-12T06:49:11.902814
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Fetching 16 files: 100%|██████████| 16/16 [00:00<00:00, 7390.03it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
Traceback (most recent call last):
File "/usr/local/bin/docker-entrypoint.py", line 174, in <module>
main()
File "/usr/local/bin/docker-entrypoint.py", line 157, in main
stable_diffusion(
File "/usr/local/bin/docker-entrypoint.py", line 37, in stable_diffusion
pipe = StableDiffusionPipeline.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/diffusers/pipeline_utils.py", line 179, in to
module.to(torch_device)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 982, in to
return self._apply(convert)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 635, in _apply
module._apply(fn)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 635, in _apply
module._apply(fn)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 658, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 980, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 222, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library
from stable-diffusion-docker.
Are you also on NixOS? If you run ./build dev
it will drop you into a terminal, and then you can try python -c "import torch; print(torch.cuda.is_available())"
. Let me know what the results are.
from stable-diffusion-docker.
My result from ./build dev
is:
$ python -c "import torch; print(torch.cuda.is_available())"
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 34: CUDA driver is a stub library (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
False
$
Update:
I'm also playing with InvokeAI. I have a conda environment setup (not sure how to share that, but basically on nix I run conda-shell; conda activate invokeai
and was using the recent repo here: https://github.com/invoke-ai/InvokeAI/)
If I use that conda environment (obviously this isn't inside docker) then I see this:
$ python -c "import torch; print(torch.cuda.is_available())"
True
Can I provide other information? At least this gives me hope that my host system is not totally broken.
from stable-diffusion-docker.
That's good information, thanks. What version of Docker are you on? Also, is this other closed issue useful in any way? #13
from stable-diffusion-docker.
$ sudo docker version
Client:
Version: 20.10.7
API version: 1.41
Go version: go1.16.8
Git commit: v20.10.7
Built: Thu Jan 1 00:00:00 1970
OS/Arch: linux/amd64
Context: default
Experimental: true
Server:
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.16.8
Git commit: v20.10.7
Built: Tue Jan 1 00:00:00 1980
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.5.7
GitCommit: v1.5.7
runc:
Version: 1.0.0-rc95
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
from stable-diffusion-docker.
Good to know. Since the issue has been reported a couple times I think it's worth making some changes either to the Dockerfile
or add something to the README.
from stable-diffusion-docker.
Well, it is working, but got a false positive with the NSFW content. That's weird:
./build.sh run --W 256 --H 256 --half --attention-slicing --prompt 'abstract art'
load pipeline start: 2022-10-12T21:06:19.903981
Fetching 16 files: 100%|██████████| 16/16 [00:00<00:00, 35582.64it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
loaded models after: 2022-10-12T21:06:36.461160
100%|██████████| 51/51 [00:13<00:00, 3.88it/s]
Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed.
completed pipeline: 2022-10-12T21:06:50.063386
Next time worked, so I think you can close this issue. Thank you.
from stable-diffusion-docker.
The safety checker is pretty sensitive even for regular content. You can skip it with --skip
.
from stable-diffusion-docker.
I added a fix which removes the stubs before any code can run, so hopefully this should fix the issue long term!
from stable-diffusion-docker.
Related Issues (20)
- CUDA out of memory error HOT 2
- [Announcement] Some txt2img options will be renamed soon!
- using embeddings? HOT 2
- module numpy as no attribute 'float' did you mean 'cfloat' HOT 4
- I am having trouble with ./build.sh build HOT 4
- Addition of Samplers HOT 2
- Switching Checkpoints HOT 2
- PermissionError: [Errno 13] Permission denied: '/output/(name).png' HOT 3
- When using txt2img, I get the error "AttributeError: 'NoneType' object has no attribute 'replace'". HOT 4
- error in code HOT 2
- not utilizing full Hardware HOT 1
- Local safetensors file HOT 2
- Only black images are produced with --skip parameter, without --skip I always get NSFW block HOT 2
- could not select device driver "" with capabilities: [[gpu]] HOT 2
- Could not find torch==2.0.1+cu117 HOT 2
- Unable to Specify GPU HOT 2
- non display after a completed pipeline HOT 1
- change download-location to other drive HOT 2
- TypeError: '>=' not supported between instances of 'NoneType' and 'float' HOT 3
- --image command line no longer works HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stable-diffusion-docker.