projectnuwa / dragnuwa Goto Github PK

View Code? Open in Web Editor NEW

733.0 22.0 69.0 1 KB

dragnuwa's Introduction

DragNUWA

DragNUWA is being transferred to the Microsoft Branch. It will be available once applicable reviews are completed.

dragnuwa's People

Contributors

Stargazers

Watchers

dragnuwa's Issues

conda environment errors

× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for tokenizers (pyproject.toml) ... error
ERROR: Failed building wheel for tokenizers
Building wheel for clip (setup.py) ... done
Created wheel for clip: filename=clip-1.0-py3-none-any.whl size=1369497 sha256=05c5750119d4467689ef336c12e760b554adee4d8d52fd76cb0bdae5e7a18487
Stored in directory: /tmp/pip-ephem-wheel-cache-bbnhb6qm/wheels/3f/7c/a4/9b490845988bf7a4db33674d52f709f088f64392063872eb9a
Successfully built clip
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Followed instructions, missing file.

I followed the instructions in the Readme.

When trying to run the download.sh file, I get:

Failed to retrieve file url:

        Too many users have viewed or downloaded this file recently. Please
        try accessing the file again later. If the file you are trying to
        access is particularly large or is shared with many people, it may
        take up to 24 hours to be able to view or download the file. If you
        still can't access a file after 24 hours, contact your domain
        administrator.

You may still be able to access the file from the browser:

        https://drive.google.com/uc?id=1Z4JOley0SJCb35kFF4PCc6N6P1ftfX4i

but Gdown can't. Please check connections and permissions.

So I just download it directly in the browser and copy the file to models/.

╰─(DragNUWA) ⠠⠵ ls models                                                                                                  on main|…1
4799.pth  Download.sh

Then when I try to run the demo script, the following happens:

VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder #1: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #3: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder #4: ConcatTimestepEmbedderND with 0 params. Trainable: False
parameters of gaussian kernel: kernel_size: 199, sigma: 20, channels: 2
Lora successfully injected into Net.
Traceback (most recent call last):
  File "DragNUWA_demo.py", line 221, in <module>
    DragNUWA_net = Drag("cuda:0", 'models/drag_nuwa_svd.pth', 'DragNUWA_net.py', 320, 576, 14)
  File "DragNUWA_demo.py", line 72, in __init__
    state_dict = file2data(model_path, map_location='cpu')
  File "/home/arthur/dev/ai/DragNUWA/utils.py", line 161, in file2data
    data = torch.load(filename, map_location=kwargs.get('map_location'))
  File "/home/arthur/.anaconda3/envs/DragNUWA/lib/python3.8/site-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/arthur/.anaconda3/envs/DragNUWA/lib/python3.8/site-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/arthur/.anaconda3/envs/DragNUWA/lib/python3.8/site-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/drag_nuwa_svd.pth'

What is models/drag_nuwa_svd.pth and where do I get it?

Thanks.

Request alternative model host

Can we have huggingface model host in addition to google drive on README?

Windows 10 Cuda 11.7 setup. Might also work with Cuda 11.8

I have encountered some issues making this work and ended up using the setup below instead.
If you get an error torch._six not found you might want to try the below setup

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

You might still get an error where the module is not found when it is already installed. I received a lot of this error and my workaround is to find an older version say just before Dec 2023 and it works. It took me a whole day experimenting but I finally found the right formula that worked for me. You can replace the module list in environment.txt and replace it with below.

absl-py==1.3.0
aiofiles==23.2.1
aiohttp==3.9.1
aiosignal==1.3.1
altair==5.2.0
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.2.0
appdirs==1.4.4
async-timeout==4.0.3
attrs==23.2.0
blosc2==2.5.1
braceexpand==0.1.7
Brotli @ file:///C:/Windows/Temp/abs_63l7912z0e/croots/recipe/brotli-split_1659616056886/work
cachetools==5.3.2
certifi @ file:///C:/b/abs_91u83siphd/croot/certifi_1700501720658/work/certifi
cffi @ file:///C:/b/abs_924gv1kxzj/croot/cffi_1700254355075/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
click==8.1.7
colorama==0.4.6
colorlog==6.8.0
configobj==5.0.8
contourpy==1.0.6
cryptography @ file:///C:/b/abs_e8cnom_zw_/croot/cryptography_1702071486468/work
cycler==0.11.0
decorator==4.4.2
deepdish==0.3.7
deepspeed==0.3.16
docker-pycreds==0.4.0
einops==0.7.0
exceptiongroup==1.2.0
fastapi==0.109.0
ffmpy==0.3.1
filelock @ file:///C:/b/abs_f2gie28u58/croot/filelock_1700591233643/work
fonttools==4.38.0
frozenlist==1.4.1
fsspec==2023.12.2
ftfy==6.1.3
gitdb==4.0.11
GitPython==3.1.41
gmpy2 @ file:///C:/ci/gmpy2_1645438895476/work
google-auth==2.27.0
google-auth-oauthlib==1.2.0
gradio==3.50.2
gradio_client==0.8.1
grpcio==1.60.0
h11==0.14.0
h5py==3.10.0
httpcore==1.0.2
httpx==0.26.0
huggingface-hub==0.19.4
idna @ file:///C:/b/abs_bdhbebrioa/croot/idna_1666125572046/work
imageio==2.33.1
imageio-ffmpeg==0.4.9
importlib-resources==6.1.1
install==1.3.5
Jinja2 @ file:///C:/b/abs_7cdis66kl9/croot/jinja2_1666908141852/work
json-lines==0.5.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
kiwisolver==1.4.4
kornia==0.7.1
lightning-utilities==0.10.1
Markdown==3.5.2
markdown-it-py==3.0.0
MarkupSafe @ file:///C:/b/abs_ecfdqh67b_/croot/markupsafe_1704206030535/work
matplotlib==3.6.2
mdurl==0.1.2
mkl-fft @ file:///C:/b/abs_19i1y8ykas/croot/mkl_fft_1695058226480/work
mkl-random @ file:///C:/b/abs_edwkj1_o69/croot/mkl_random_1695059866750/work
mkl-service==2.4.0
moviepy==1.0.3
mpmath @ file:///C:/b/abs_7833jrbiox/croot/mpmath_1690848321154/work
msgpack==1.0.7
multidict==6.0.4
mypy-extensions==1.0.0
ndindex==1.7
networkx @ file:///C:/b/abs_e6gi1go5op/croot/networkx_1690562046966/work
ninja==1.11.1.1
numexpr==2.8.8
numpy @ file:///C:/b/abs_16b2j7ad8n/croot/numpy_and_numpy_base_1704311752418/work/dist/numpy-1.26.3-cp310-cp310-win_amd64.whl#sha256=e84057072c37569bd0e11652dc2a75980d4d360f2391adf6a29a2fb1622d20ff
oauthlib==3.2.2
omegaconf==2.3.0
open-clip-torch==2.24.0
opencv-contrib-python==4.6.0.66
opencv-python==4.9.0.80
orjson==3.9.7
packaging==23.2
pandas==2.2.0
Pillow @ file:///C:/b/abs_20ztcm8lgk/croot/pillow_1696580089746/work
proglog==0.1.10
protobuf==3.19.0
psutil==5.9.8
py-cpuinfo==9.0.0
pyasn1==0.5.1
pyasn1-modules==0.3.0
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pydantic==2.5.3
pydantic_core==2.14.5
pydub==0.25.1
Pygments==2.17.2
pyOpenSSL @ file:///C:/b/abs_08f38zyck4/croot/pyopenssl_1690225407403/work
pyparsing==3.0.9
pyre-extensions==0.0.23
PySocks @ file:///C:/ci_310/pysocks_1642089375450/work
python-dateutil==2.8.2
python-multipart==0.0.6
pytorch-lightning==2.1.3
pytz==2023.3.post1
PyYAML==6.0.1
referencing==0.32.1
regex==2023.8.8
requests @ file:///C:/b/abs_316c2inijk/croot/requests_1690400295842/work
requests-oauthlib==1.3.1
rich==13.7.0
rpds-py==0.5.3
rsa==4.9
ruff==0.1.14
safetensors==0.4.0
scipy==1.12.0
semantic-version==2.10.0
sentencepiece==0.1.98
sentry-sdk==1.39.2
setproctitle==1.3.3
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
starlette==0.35.1
sympy @ file:///C:/b/abs_82njkonm7f/croot/sympy_1701397685028/work
tables==3.9.2
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorboardX==1.8
timm==0.9.12
tokenizer==3.4.3
tokenizers==0.14.1
tomlkit==0.12.0
toolz==0.12.1
torch==1.13.1
torchaudio==0.13.1
torchmetrics==1.3.0.post0
torchvision==0.14.1
tqdm==4.66.1
transformers==4.37.1
typer==0.9.0
typing-inspect==0.9.0
typing_extensions @ file:///C:/b/abs_72cdotwc_6/croot/typing_extensions_1705599364138/work
tzdata==2023.4
urllib3 @ file:///C:/b/abs_9cmlsrm3ys/croot/urllib3_1698257595508/work
uvicorn==0.27.0
wandb==0.16.2
wcwidth==0.2.13
webdataset==0.2.86
websockets==11.0.3
Werkzeug==3.0.1
win-inet-pton @ file:///C:/ci_310/win_inet_pton_1642658466512/work
xformers==0.0.16
yarl==1.9.4

Error while adding drag

Every time I try to use the Gradio UI, it throws this error. It loads the photo at the incorrect size, and I cannot add drags.

IndexError: index 697 is out of bounds for dimension 0 with size 320
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 833, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
  File "/workspace/DragNUWA/DragNUWA_demo.py", line 159, in run
    input_drag[i][int(start_point[1])][int(start_point[0])][0] = end_point[0] - start_point[0]
IndexError: index 697 is out of bounds for dimension 0 with size 320

How Vram required?

How much GPU RAM does dragnuwa use?

Colab is not working

Is it me or the colab link is not working?

Request for Enhancement: Consideration of Additional Features for DragNUWA

Dear DragNUWA Developers,

I hope this message finds you well. I have been thoroughly impressed with the capabilities of DragNUWA and the fine-grained control it offers in video generation, integrating text, image, and trajectory inputs. The potential applications of such a tool are indeed vast and varied.

Having experimented with the current functionalities of both DragNUWA 1.0 and the updated 1.5 versions, I find myself envisioning a few more features that could significantly enhance the user experience and expand the utility of the tool. I would like to propose the following features for your consideration:

Multi-object Manipulation: The ability to manipulate multiple objects independently within a single image would allow for more complex scene dynamics and interactions.
Path Prediction: Implementing a feature that suggests possible trajectories based on the context of the scene and the objects within it could simplify the user's task and inspire creative uses of the tool.
Audio Integration: Adding a feature to synchronise generated videos with audio input could open up possibilities for creating music videos or short films with dynamic visual effects that respond to sound.
Custom Object Import: Allowing users to import custom objects into the scene would greatly personalise the creative process, enabling the creation of unique and bespoke content.
Real-time Collaboration: A feature that supports real-time collaboration could be invaluable for team projects, allowing multiple users to contribute to the video generation process simultaneously.
Advanced Editing Toolkit: Providing an advanced set of editing tools for post-generation tweaks could help refine the final output without the need for external software.

I trust that these suggestions align with the innovative spirit of your project and could be of interest to the community. I would be most grateful if you could consider these potential enhancements for future updates.

Thank you for your dedication to this project and for the impressive work you have already accomplished. I am looking forward to any thoughts you might have on these suggestions.

Best regards,
yihong1120

If I want to change the image resolution, what should I change?

Text prompt and motion bucket questions

Hello, and thank you for sharing your work—very cool!

I've been trying the demo and, from what I've learn from your paper, your model can control the generated video using a single image, a text prompt, and a specified trajectory. However, I noticed that there's no option to input a text prompt in your demo. Where do you condition on the prompt?

I also came across the term "motion bucket" in the demo. Could you clarify what that refers to?

IndexError: index 342 is out of bounds for dimension 0 with size 320

With updated gradio, 2080ti 22gb VRAM, updated ComfyUI, when I ran workflow.json in ComfyUI, got following errro:

execution.py :180 2024-02-09 16:25:08,272 !!! Exception during processing !!!
execution.py :181 2024-02-09 16:25:08,291 Traceback (most recent call last):
File "D:\github\ComfyUI\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\github\ComfyUI\ComfyUI\execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\github\ComfyUI\ComfyUI\execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\github\ComfyUI\ComfyUI\custom_nodes\ComfyUI-DragNUWA\nodes.py", line 873, in run_inference
return model.run_2(image_pil, tracking_points, inference_batch_size, motion_bucket_id, use_optical_flow, directory)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\github\ComfyUI\ComfyUI\custom_nodes\ComfyUI-DragNUWA\nodes.py", line 300, in run_2
input_drag[i][int(start_point[1])][int(start_point[0])][0] = end_point[0] - start_point[0]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
IndexError: index 342 is out of bounds for dimension 0 with size 320

AttributeError: 'list' object has no attribute 'constructor_args'

Info:
Operating Systrem: Debian
Shell: zsh
Conda install: Anaconda

All steps were followed exactly as is in the README

Steps to reproduce:

Route 1:
Step 1: Upload an image to the demo gradio page
Step 2: Click and drag on the image in the top left panel.

Route 2:
Step 1 Upload an image to the demo gradio page
Step 2: Click on the "Add Drag" button

Expected results:
Ability to draw a red arrow on the image like the example images show

Actual Results:

Top left panel greys out and is replaced with a textbox that says "Error"
CLI outputs the following error stack:

AttributeError: 'list' object has no attribute 'constructor_args'
Traceback (most recent call last):
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/sblake/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
  File "DragNUWA_demo.py", line 243, in add_drag
    tracking_points.constructor_args['value'].append([])

TypeError: make_grid() got an unexpected keyword argument 'range'

reproduction

launch DragNUWA_demo.py
After drawing some arrows, press "RUN".
At the end of a process, TypeError: make_grid() got an unexpected keyword argument 'range'.

quick fix

DragNUWA_demo.py, line 198, rewrite range to value_range

Exit code 137 on Mac M1

I followed all the instructions but failed to run it on the Mac M1. Here is the log:
"""
python -m DragNUWA_demo
[2024-01-13 19:15:42,619] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to mps (auto detect)
[2024-01-13 19:15:42,865] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing
Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder #1: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Initialized embedder #3: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder #4: ConcatTimestepEmbedderND with 0 params. Trainable: False
parameters of gaussian kernel: kernel_size: 199, sigma: 20, channels: 2
Lora successfully injected into Net.
[2024-01-13 19:16:15 utils.py:176 file2data] Loaded data from /Users//Projects/DragNUWA/models/drag_nuwa_svd.pth

Process finished with exit code 137 (interrupted by signal 9:SIGKILL)
"""

quick make an extension into automatic111 please

please turn this install instead of conda , automatic111 compatible or just standalone , that be great, thanks

How do you train svd?

Thanks for your great work!
DragNUWA1.5 is based on svd, can you share some details of training svd?

Unexpected token

When I click run after uploading image and adding drags, the animation doesn't show up, and I saw Unexpected token '<', " <!DOCTYPE "... is not valid JSON error msg after 30 seconds. The terminal shows a percentage bar but it never update, like:
0%| | 0/1 [01:14<?, ?it/s]

Does anyone see similar issues?

WSL Could not load library libcudnn_cnn_infer.so.8.

Hi,

Thank you so much for your work on this, really appreciated!
I was, as expected, unable to run this on Windows so after being kindly advised to do so, I've set up a WSL environment using Ubuntu and worked through the depencenies however I have got stuck on the following error:

Traceback (most recent call last):
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
  File "DragNUWA_demo.py", line 296, in add_tracking_points
    tracking_points.constructor_args['value'][-1].append(evt.index)
IndexError: list index out of range
You selected None at [110, 84] from image
Traceback (most recent call last):
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/routes.py", line 534, in predict
    output = await route_utils.call_process_api(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/lylo/miniconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
  File "DragNUWA_demo.py", line 296, in add_tracking_points
    tracking_points.constructor_args['value'][-1].append(evt.index)
IndexError: list index out of range
You selected None at [58, 208] from image
You selected None at [71, 59] from image
You selected None at [524, 184] from image
You selected None at [528, 64] from image
  0%|                                                                                             | 0/1 [00:00<?, ?it/s]Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
Aborted

I can confirm that the demo project is being run and that I can connect to it from Windows via a browser but this error message is triggered whenever I try to add the motion arrows and run.
I have tried manually installing cuddn, upgrading gradio, Stack Overflow also did not help, any insight into this would be greatly appreiacted.
Thank you!

Environment Execution error inquiry

Is it okay to create a conda environment and use pip? Doesn't it conflict with existing packages?
-> The environment was somehow managed.

file upload -> ok

but The moment you drag on the image

File "DragNUWA_demo.py", line 314, in add_tracking_points
cv2.circle(transparent_layer, tuple(track[0]), 5, (255, 0, 0, 255), -1)
IndexError: list index out of range

so many error boom

Please make the environment right.

happy
Halloween

Hardware requirements.

Thanks a lot for sharing this and the amazing work.

How much vram is this expecting ? I have a 12G card would that work? If not, is there some way to switch to a cpu backend and use cpu ram? (even if that requires a bit of coding that's fine, just want to know if it can be done/how)

Thanks!

About training the model

Thanks for your inspiring work! The performance is really impressive. I noticed that the whole model weights of SVD have been replaced, so I am curious about the training details of DragNUWA on SVD, like which parts of the model are finetuned, how many samples are required to train the model, and how are these sampled been pre-processed.
It is OK if some details can not be published. Thanks in advance.

ERROR: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects

The system is CentOS Linux release 7.6.1810 (Core).

facial distortion

i want to use dragnuwa to generate clasp hand ,but facial distortion

is there any update on this project?

Python3.8 version packages

Hello, first of all compliments for the project, that's sick :)

I tried to install the repo locally but I got some issues with the installation of the packages.

I created an environment with conda using python 3.8.18

Now when I run pip install -r enviroment.txt I get this error, It's about the compatibility with the version of python used and the package versions.

Can I know what Python version are you using?

Log:

(sdxl_drag) PS C:\Users\gabri\Documents\Codes\Projects\DragNUWA> pip install -r .\environment.txt
Collecting clip@ git+https://github.com/openai/CLIP.git (from -r .\environment.txt (line 3))
  Cloning https://github.com/openai/CLIP.git to c:\users\gabri\appdata\local\temp\pip-install-ancddpef\clip_cd12034f0e004f2989a6517763714a75
  Running command git clone --filter=blob:none --quiet https://github.com/openai/CLIP.git 'C:\Users\gabri\AppData\Local\Temp\pip-install-ancddpef\clip_cd12034f0e004f2989a6517763714a75'
  Resolved https://github.com/openai/CLIP.git to commit a1d071733d7111c9c014f024669f959182114e33
  Preparing metadata (setup.py) ... done
Collecting black (from -r .\environment.txt (line 1))
  Using cached black-23.12.1-cp38-cp38-win_amd64.whl.metadata (68 kB)
Collecting chardet (from -r .\environment.txt (line 2))
  Using cached chardet-5.2.0-py3-none-any.whl.metadata (3.4 kB)
Collecting einops>=0.6.1 (from -r .\environment.txt (line 4))
  Using cached einops-0.7.0-py3-none-any.whl.metadata (13 kB)
Collecting fairscale>=0.4.13 (from -r .\environment.txt (line 5))
  Using cached fairscale-0.4.13.tar.gz (266 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting fire>=0.5.0 (from -r .\environment.txt (line 6))
  Using cached fire-0.5.0.tar.gz (88 kB)
  Preparing metadata (setup.py) ... done
Collecting fsspec>=2023.6.0 (from -r .\environment.txt (line 7))
  Using cached fsspec-2023.12.2-py3-none-any.whl.metadata (6.8 kB)
Collecting invisible-watermark>=0.2.0 (from -r .\environment.txt (line 8))
  Using cached invisible_watermark-0.2.0-py3-none-any.whl.metadata (8.2 kB)
Collecting kornia==0.6.9 (from -r .\environment.txt (line 9))
  Using cached kornia-0.6.9-py2.py3-none-any.whl (569 kB)
Collecting matplotlib>=3.7.2 (from -r .\environment.txt (line 10))
  Using cached matplotlib-3.7.4-cp38-cp38-win_amd64.whl.metadata (5.8 kB)
Collecting natsort>=8.4.0 (from -r .\environment.txt (line 11))
  Using cached natsort-8.4.0-py3-none-any.whl.metadata (21 kB)
Collecting ninja>=1.11.1 (from -r .\environment.txt (line 12))
  Using cached ninja-1.11.1.1-py2.py3-none-win_amd64.whl.metadata (5.4 kB)
Collecting numpy>=1.24.4 (from -r .\environment.txt (line 13))
  Using cached numpy-1.24.4-cp38-cp38-win_amd64.whl.metadata (5.6 kB)
Collecting omegaconf>=2.3.0 (from -r .\environment.txt (line 14))
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting open-clip-torch>=2.20.0 (from -r .\environment.txt (line 15))
  Using cached open_clip_torch-2.24.0-py3-none-any.whl.metadata (30 kB)
Collecting opencv-python==4.6.0.66 (from -r .\environment.txt (line 16))
  Using cached opencv_python-4.6.0.66-cp36-abi3-win_amd64.whl (35.6 MB)
Collecting pandas>=2.0.3 (from -r .\environment.txt (line 17))
  Using cached pandas-2.0.3-cp38-cp38-win_amd64.whl.metadata (18 kB)
Collecting pillow>=9.5.0 (from -r .\environment.txt (line 18))
  Using cached pillow-10.2.0-cp38-cp38-win_amd64.whl.metadata (9.9 kB)
Collecting pudb>=2022.1.3 (from -r .\environment.txt (line 19))
  Using cached pudb-2023.1.tar.gz (224 kB)
  Preparing metadata (setup.py) ... done
Collecting pytorch-lightning==2.0.1 (from -r .\environment.txt (line 20))
  Using cached pytorch_lightning-2.0.1-py3-none-any.whl (716 kB)
Collecting pyyaml>=6.0.1 (from -r .\environment.txt (line 21))
  Using cached PyYAML-6.0.1-cp38-cp38-win_amd64.whl.metadata (2.1 kB)
Collecting scipy>=1.10.1 (from -r .\environment.txt (line 22))
  Using cached scipy-1.10.1-cp38-cp38-win_amd64.whl (42.2 MB)
Collecting streamlit>=0.73.1 (from -r .\environment.txt (line 23))
  Using cached streamlit-1.29.0-py2.py3-none-any.whl.metadata (8.2 kB)
Collecting tensorboardx==2.6 (from -r .\environment.txt (line 24))
  Using cached tensorboardX-2.6-py2.py3-none-any.whl (114 kB)
Collecting timm>=0.9.2 (from -r .\environment.txt (line 25))
  Using cached timm-0.9.12-py3-none-any.whl.metadata (60 kB)
Collecting tokenizers==0.12.1 (from -r .\environment.txt (line 26))
  Using cached tokenizers-0.12.1-cp38-cp38-win_amd64.whl (3.3 MB)
Collecting torch>=2.0.1 (from -r .\environment.txt (line 27))
  Using cached torch-2.1.2-cp38-cp38-win_amd64.whl.metadata (26 kB)
Collecting torchaudio>=2.0.2 (from -r .\environment.txt (line 28))
  Using cached torchaudio-2.1.2-cp38-cp38-win_amd64.whl.metadata (6.4 kB)
Collecting torchdata==0.6.1 (from -r .\environment.txt (line 29))
  Using cached torchdata-0.6.1-cp38-cp38-win_amd64.whl (1.3 MB)
Collecting torchmetrics>=1.0.1 (from -r .\environment.txt (line 30))
  Using cached torchmetrics-1.2.1-py3-none-any.whl.metadata (20 kB)
Collecting torchvision>=0.15.2 (from -r .\environment.txt (line 31))
  Using cached torchvision-0.16.2-cp38-cp38-win_amd64.whl.metadata (6.6 kB)
Requirement already satisfied: tqdm>=4.65.0 in c:\users\gabri\.conda\envs\sdxl_drag\lib\site-packages (from -r .\environment.txt (line 32)) (4.66.1)
Collecting transformers==4.19.1 (from -r .\environment.txt (line 33))
  Using cached transformers-4.19.1-py3-none-any.whl (4.2 MB)
ERROR: Ignored the following versions that require a different python version: 0.55.2 Requires-Python <3.5; 1.11.0 Requires-Python <3.13,>=3.9; 1.11.0rc1 Requires-Python <3.13,>=3.9; 1.11.0rc2 Requires-Python <3.13,>=3.9; 1.11.1 Requires-Python <3.13,>=3.9; 1.11.2 Requires-Python <3.13,>=3.9; 1.11.3 Requires-Python <3.13,>=3.9; 1.11.4 Requires-Python >=3.9; 1.12.0rc1 Requires-Python >=3.9; 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 1.26.0rc1 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9; 1.26.2 Requires-Python >=3.9; 1.26.3 Requires-Python >=3.9; 2.1.0 Requires-Python >=3.9; 2.1.0rc0 Requires-Python >=3.9; 2.1.1 Requires-Python >=3.9; 2.1.2 Requires-Python >=3.9; 2.1.3 Requires-Python >=3.9; 2.1.4 Requires-Python >=3.9; 2.2.0rc0 Requires-Python >=3.9; 3.8.0 Requires-Python >=3.9; 3.8.0rc1 Requires-Python >=3.9; 3.8.1 Requires-Python >=3.9; 3.8.2 Requires-Python >=3.9
ERROR: Could not find a version that satisfies the requirement triton==2.0.0 (from versions: none)
ERROR: No matching distribution found for triton==2.0.0

ValueError: `x` must contain at least 2 elements.

Ubuntu 22. Followed README instructions, created environment, downloaded the model, Gradio interface launches. I add some arrows, click run, and get this error:

Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
You selected None at [425, 108] from image
You selected None at [385, 93] from image
You selected None at [345, 87] from image
You selected None at [219, 239] from image
You selected None at [264, 242] from image
You selected None at [303, 239] from image
You selected None at [183, 122] from image
You selected None at [143, 123] from image
You selected None at [105, 128] from image
Traceback (most recent call last):
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/gradio/utils.py", line 678, in wrapper
    response = f(*args, **kwargs)
  File "DragNUWA_demo.py", line 152, in run
    splited_track = interpolate_trajectory(splited_track, self.model_length)
  File "DragNUWA_demo.py", line 21, in interpolate_trajectory
    fx = PchipInterpolator(t, x)
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/scipy/interpolate/_cubic.py", line 234, in __init__
    x, _, y, axis, _ = prepare_input(x, y, axis)
  File "/home/nathan/anaconda3/envs/DragNUWA/lib/python3.8/site-packages/scipy/interpolate/_cubic.py", line 46, in prepare_input
    raise ValueError("`x` must contain at least 2 elements.")
ValueError: `x` must contain at least 2 elements.

Any ideas?