Hello, I tried the GPU installation instructions on Windows, but within a venv. I

Wow, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Refacing operation does not seem to fully use GPU about refacer HOT 6 CLOSED

xaviviro commented on July 19, 2024

Refacing operation does not seem to fully use GPU

from refacer.

Comments (6)

xaviviro commented on July 19, 2024 1

First off, I want to thank you for the detailed information you've provided. I've initially focused on the functionality, and now I'm moving towards performance-related issues, like enhancing GPU utilization and parallel processing. I'm also aiming to address GPU usage on OSX CoreML. I apologize for any inconvenience. As of now, I'm the sole contributor and working on this in my spare time. Thanks for your understanding and patience!

In addition, I should point out that it's unlikely Refacer will be able to match Roop's speed. Roop doesn't do face comparisons, while Refacer does, which is why Refacer allows for the selection of which face to replace, one or many. Furthermore, please keep in mind that the processing time increases as the number of faces that need to be compared increases.

from refacer.

xaviviro commented on July 19, 2024 1

The only thing left for me to add is NVIDIA acceleration to the final ffmpeg process. Stay tuned for updates on that. Thank you for your patience and feedback!

from refacer.

ooofest commented on July 19, 2024 1

The only thing left for me to add is NVIDIA acceleration to the final ffmpeg process. Stay tuned for updates on that. Thank you for your patience and feedback!

It is much faster in overall processing now! The speed increase is tremendous . . . here is a quick example:

To create a public link, set share=True in launch().
Total frames: 13966
Extracting frames: 100%|██████████████████████████████████████████████████████▉| 13965/13966 [00:04<00:00, 2984.45it/s]
Processing frames: 100%|█████████████████████████████████████████████████████████| 13965/13965 [08:04<00:00, 28.85it/s]
Merging audio with the refaced video...
The process has finished.

Although I did notice some cases where - if the video file is rather large (e.g., > 700MB) then there would be a timeout error in Gradio usually after the image extraction step.

Also, there are some videos where a memory allocation error crops up and I am still experimenting to see what might be the type of input which causes this symptom:

To create a public link, set share=True in launch().
Total frames: 4096
Extracting frames: 100%|██████████████████████████████████████████████████████████| 4096/4096 [00:14<00:00, 280.75it/s]
Processing frames: 0%| | 8/4096 [00:00<04:38, 14.67it/s]
Traceback (most recent call last):
File "D:\refacer-main\venv\lib\site-packages\gradio\routes.py", line 427, in run_predict
output = await app.get_blocks().process_api(
File "D:\refacer-main\venv\lib\site-packages\gradio\blocks.py", line 1323, in process_api
result = await self.call_function(
File "D:\refacer-main\venv\lib\site-packages\gradio\blocks.py", line 1051, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\refacer-main\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\refacer-main\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "D:\refacer-main\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "D:\refacer-main\app.py", line 30, in run
return refacer.reface(video_path,faces)
File "D:\refacer-main\refacer.py", line 184, in reface
results = list(tqdm(executor.map(self.__process_faces, frames), total=len(frames),desc="Processing frames"))
File "D:\refacer-main\venv\lib\site-packages\tqdm\std.py", line 1178, in iter
for obj in iterable:
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 458, in result
return self.__get_result()
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\concurrent\futures_base.py", line 403, in __get_result
raise self._exception
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, self.kwargs)
File "D:\refacer-main\refacer.py", line 144, in __process_faces
frame = self.face_swapper.get(frame, face, rep_face[1], paste_back=True)
File "D:\refacer-main\venv\lib\site-packages\insightface\model_zoo\inswapper.py", line 53, in get
pred = self.session.run(self.output_names, {self.input_names[0]: blob, self.input_names[1]: latent})[0]
File "D:\refacer-main\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 217, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'Conv_42' Status Message: D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:121 onnxruntime::CudaCall D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_call.cc:114 onnxruntime::CudaCall CUDA failure 2: out of memory ; GPU=0 ; hostname=HOMEPC ; file=D:\a_work\1\s\onnxruntime\core\providers\cuda\cuda_allocator.cc ; line=48 ; expr=cudaMalloc((void)&p, size);

from refacer.

suphamster commented on July 19, 2024 1

I've speed up about 4x faster (from 5-6 it/s to 20-24 it/s) with this tweak main...suphamster:refacer:patch-1 but I dunno why GPU usage still low on current version of refacer and tweak raises CPU load only. I have RTX 4070 GPU, Win10 22H2.

from refacer.

ooofest commented on July 19, 2024

Thanks for your helpful reply!

Yes, I figured that Refacer's unique logic to detect faces - which has been working very well for me, thus far - could add cycles to the processing.

With that in mind, perhaps GPU multithreading might be an avenue to consider?

Thanks for this repo, the ability to specify a particular face for swapping and keeping the resulting swapped filesize reasonable make this valuable and a good complement to Roop.

from refacer.

xaviviro commented on July 19, 2024

Wow, @ooofest I just tried it on Google Colab with the latest update and it gives me over 8it/s. If you want to try on Colab:

from refacer.

Refacing operation does not seem to fully use GPU about refacer HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent