Git Product home page Git Product logo

sinner's Introduction

Build Status

sinner: sinner is non exactly roop

Deepfakes and more.

What is it?

This is the rework of the s0md3v/roop that I'm working on for entertainment and educational purposes. It doesn't aim to be popular; it's just a fork made the way I want it to be. The tasks that I aim to accomplish here are:

  • ✅ Rewriting the code using object-oriented programming (OOP).
  • ✅ Providing a clear and user-friendly API for creating processing modules.
  • ✅ Supporting different input and output data types through frame handlers.
  • ✅ Implementing strict typing and static analysis for improved code quality.
  • ✅ Enabling in-memory processing without the need for temporary frames.
  • ✅ Allowing the ability to resume processing after a stop.
  • ✅ Implementing a continuous frames processing chain.
  • ✅ Implementing memory control code.
  • ✅ Providing code coverage through comprehensive tests.

How do I install it?

The basic installation instructions for now are the same as those in the s0md3v/roop, check them out. In short, you need to install python 3.10 or a later version, VC runtimes (on windows), and desired Execution Provider kit (depending on your hardware and OS). Then you need to install required python packages, and there are some differences in the process:

I have PC with Windows/Linux and no CUDA GPU

Run pip install -r requirements.txt. It will install packages, just enough to run magic on CPU only.

I have PC with Windows/Linux and GPU with CUDA support

Run pip install -r requirements-pc-cuda.txt. It will install packages with CUDA support. Do not forget: you also have to install CUDA drivers as well.

I have x86 Mac

Run pip install -r requirements-mac-x86.txt. It will use only CPU powers, but it should work.

I have Apple Silicon Mac

Run pip install -r requirements-mac-arm64.txt. There is no CUDA, obviously, but there's some hardware acceleration too.

Anyway, packages should be installed successfully. Otherwise, get a look to the command output, usually you may fix minor issues (like version requirements change) by yourself.

If nothing helps, feel free to create an issue with your problem, we will try to figure it out together.

How do I use it?

Go to application folder and run python sin.py with desired set of command-line parameters (or just pick one of examples and make changes to suit your need).

You can get the list of all available command-line parameters by running the program with --h or --help keys. Those commands will list all configurable modules and their supported parameters.

Some modules may have the same parameters. It is okay: those parameters (and its values) are shared. It is also okay, if parameters expected values are different between modules: usually, they will be harmonized in runtime. But if something goes wrong, you will get an explicit error message.

Also, you can read about modules parameters here

Built-in frame processors

There are modules named frame processors, and each processor can perform its own type of magic. You need to choose which frame processor (or processors) you want to use, and provide them with some sources to work on. Here is the list of built-in frame processors:

  • FaceSwapper: performs face-swapping deepfake magic. It substitutes a face from the source to a face (or faces) in the target. The processor is based on the insightface project example code. FaceSwapper demo
  • FaceEnhancer: performs face restoration and enhances the quality of the target. The processor is based on the libraries of the ARC Lab GFPGAN project. FaceEnhancer demo
  • FrameExtractor: use this processor in the processing chain when using video file as the target to force sinner extract frames to a temporary folder as a sequence of PNG files. If not used, every frame will be extracted into the memory by a processor module's request. The first way requires some disk space for temporary frames, the second way might be a little slower in some cases.
  • FrameResizer: resizes frames to certain size.
  • DummyProcessor: literally does nothing; it is just a test tool.

Take video stream from a video file, swap and enhance all faces to provided source face using CUDA device (e.g. nvidia GPU) and create the new video stream and show it in the preview window.

Command line usage examples

python sin.py --source="d:\pictures\cool_photo.jpg" --target="d:\pictures\other_cool_photo.jpg" --frame-processor=FaceSwapper

Swap one face on the d:\pictures\other_cool_photo.jpg picture to face from the d:\pictures\cool_photo.jpg picture and save resulting image to d:\pictures\cool_photo_other_cool_photo.png (autogenerated name).

python sin.py --source="d:\pictures\cool_photo.jpg" --target="d:\videos\not_a_porn.mp4" --frame-processor FaceSwapper FaceEnhancer --output="d:\results\result.mp4" --many-faces --execution-provider=cuda

Swap all faces on the d:\videos\not_a_porn.mp4 video file to the face from d:\pictures\cool_photo.jpg and enhance all faces quality, both processing will be made using the cuda provider, and result will be saved to the d:\results\result.mp4.

python sin.py --source="d:\pictures\any_picture.jpg" --target="d:\pictures\pngs_dir" --output="d:\pictures\pngs_dir\enhanced" --frame-processor=FaceEnhancer --many-faces --max-memory=24 --execution-provider=cuda --execution-threads=8

Enhance all faces in every PNG file in the d:\pictures\pngs_dir directory using the cuda provider and 8 simultaneous execution threads, with limit of 24 Gb RAM, and save every enhanced image to the d:\pictures\pngs_dir\enhanced directory.

Real-time player

This feature is still in the alpha stage, so things can be changed. There's not much to document yet; it's better to try it yourself, by running it:

python sin.py --gui

FaceSwapper demo

Virtual camera feature

You can use sinner to create a virtual real-time face-swapped camera, and use it with your software. This feature is managed by the WebCam module. You can refer to the module's live help (python sin.py -h) to find information about currently supported features. Here is a description of the common setup process:

First, you will need a real camera for the actual input. If you have only one camera, it will be used by default. Otherwise, you should pass the camera index as the --input parameter. You can also pass a path to an image or a video file to use them as input sources, although it may not be as enjoyable.

Second, you will need to create a virtual camera. The currently supported devices are suggested in the help output for --device key. If there are no supported options, install OBS Studio to obtain the obs virtual camera device. Other virtual camera software may be used, but OBS is recommended.

Third, you will need to set up the processing chain as usual.This involves selecting an image with the source face, configuring processing modules, and so on. Here are some examples:

python sin.py --camera --device=obs --source="path\to\source\face.jpg" --frame-processor=FaceSwapper --many-faces --execution-provider=cuda --preview

Capture a video stream from a webcam, apply face swapping to replace all faces with a provided source face using a CUDA device (e.g., an NVIDIA GPU), and generate a new video stream using the OBS virtual camera. Additionally, display the stream in the preview window.

Since real-time processing can be slow, depending on hardware capabilities, it might be a good idea to reduce the resolution of processed frames to achieve a higher frame rate:

python sin.py --camera --device=obs --source="path\to\source\face.jpg" --frame-processor=FaceSwapper --many-faces --execution-provider=cuda --width=480 --height=320

There is no need to use FrameResizer module in this case, --width and --height keys will be handled by the WebCam module itself.

You also can use the camera feature to obtain a real-time preview of a video file:

python sin.py --camera --device=no --input="path\to\video.mp4" --source="path\to\source\face.jpg" --frame-processor FaceSwapper FaceEnhancer --many-faces --execution-provider=cuda --preview

Configuration file

You can store commonly used options in the configuration file, to make them apply on every run by default. Just edit sinner.ini file in the application directory and add desired parameters inside the [sinner] section as key-value pairs.

Example:

[sinner]
keep-frames = 1
many-faces = 1
execution-provider = cuda
execution-threads = 2

It is also possible to configure modules separately this way. Just create/modify a config section with the module name, and all key-value pairs from this section will be applied only to that module.

Example:

[sinner]
execution-threads = 2

[FaceSwapper]
execution-threads = 4

In the example above FaceSwapper will run in four execution threads, when other modules will run in two threads (if they support this parameter). Module configurations have the priority over global parameters (even if they passed directly from the command line).

Any parameter set from command line will override corresponding global (not module) parameter from the ini file.

You also can pass path to the custom configuration file as a command line parameter:

python sin.py --ini="d:\path\custom.ini"

How to handle output videos quality/encoding speed/etc?

In brief, sinner relies on the ffmpeg software almost every time video processing is required, and it's possible to utilize all the incredible powers of ffmpeg. Use the --ffmpeg_resulting_parameters key to control how ffmpeg will encode the output video: simply pass the usual ffmpeg parameters as the value for this key (remember not to forget enclosing the value string in commas). There are some examples:

  • --ffmpeg_resulting_parameters="-c:v libx264 -preset medium -crf 20 -pix_fmt yuv420p": use software x264 encoder (-c:v libx264) with the medium quality (-preset medium and -crf 20) and yuv420p pixel format. This is the default parameter value.
  • --ffmpeg_resulting_parameters="-c:v h264_nvenc -preset slow -qp 20 -pix_fmt yuv420p": use nVidia GPU-accelerated x264 encoder (-c:v h264_nvenc) with the good encoding quality (-preset slow and -qp 20). This encoder is worth to use if it supported by your GPU.
  • --ffmpeg_resulting_parameters="-c:v hevc_nvenc -preset slow -qp 20 -pix_fmt yuv420p": the same as above, but with x265 encoding.
  • --ffmpeg_resulting_parameters="-c:v h264_amf -b:v 2M -pix_fmt yuv420p": the AMD hardware-accelerated x264 encoder (-c:v h264_amf) with 2mbps resulting video bitrate (-b:v 2M). This should be good for AMD GPUs.

And so on. As you can find, there are a lot of different presets and options for the every ffmpeg encoder, and you can rely on the documentation to achieve desired results.

In case, when ffmpeg is not available in your system, sinner will gracefully degrade to CV2 library possibilities. In that case all video processing features should work, but in a very basic way: only with the software x264 encoder, which is slow and thriftless.

FAQ

❓ What are the differences between sinner and roop?
❗ As said before, sinner has started as a fork of roop. They share similar ideas, but they differ in the ways how those ideas should be implemented. sinner uses the same ML libraries to perform its magic, but handles them in its own way. From a developer's perspective, it has a better architecture (OOP instead of functional approach), stricter types handling and more comprehensive tests. From the point of view of a user, sinner offers additional features that Roop currently lacks.

Also, roop is dead, baby, roop is dead.

❓ Is there a NSWF filter?
❗ Nope. I don't care if you do nasty things with sinner, it's your responsibility. And sinner is just a neutral tool, like a hammer or a knife.

❓ Can I use several execution providers simultaneously?
❗ You can try. Seriously, you can set --execution-provider cuda cpu, and look, what will happen. May be it will work faster, may be it won't work at all. It is a large space for experiments.

Credits

  • s0md3v: the original author of roop
  • ffmpeg: for making video related operations easy
  • deepinsight: for their insightface project which provided a well-made library and models.
  • ARC Lab, Tencent PCG: for their GFPGAN project which provided a face restoration library and models.
  • and all developers behind libraries used in this project.

License

GNU GPL 3.0

sinner's People

Contributors

antwaneb avatar artemkiyashko avatar ceebeeeh avatar furkangozukara avatar hasanisaeed avatar henryruhs avatar jmp909 avatar justmaier avatar k1llman avatar moeblack avatar nickpittas avatar osma avatar phineas-pta avatar pozitronik avatar realcalumplays avatar s0md3v avatar saqibama avatar shivamkumar2002 avatar skalskip avatar symbiomatrix avatar titasdas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sinner's Issues

Feature: better replace for rotated faces

When face in the target is rotated, FaceSwapper result on it is bad. So, it is possible to counter rotate the frame before swapping, and rotate it back after, to achieve better swapping.

Infinite benchmarking

If all benchmark runs has a small delta, the benchmarking process may newer stop. It will be better to measure delta between the fastest and the slowest runs.

Feature: GUI

It will be nice to have the some graphical interface.

Lookahead for existing processed frames in the processing chain

It is about the case, when a task was stopped, and already processed frames deleted by user (as example, extracted frames when they are already swapped, and now enhancing in progress). Sinner will reprocess those frames again on the next task run, although they are not required anymore.
There is required some lookahead procedure to determine, what processor is running now. It is simple: find last state folder with files, look, what frames are still unprocessed, and prepare only them (if they're not exists).

State can create unecessary directory

State always creates path directory inside the attribute getter, even if path will be changed later. It is not a big deal, but i want to find a clever solution.

Improve help output

  • Add help strings to the every loadable attribute.
  • Add -h help to show help output. It requires to collect all available help data somehow. As the first approach, it requires to find all descendants of the AttributeLoader class.
  • Fix spacing in errors output.
  • Do not show attributes without help, and modules without attributes in help output.
  • Fix doubling help strings in the help output

BUG: Wrong frames count determination (again)?

It seems, that sometimes the frame count of some videos still incorrect, leads to unpredictable errors.
In one case, frame count of the video is 24750 (according both to ffprobe or cv2), but only 24745 were extracted for swapping. FaceEnhancer, somehow, sees 27844 frames, which is wrong in any way. Here can be a bug in the state module.
Also, for this video resulting frame count determined as 360.0, which also isn't right.

Feature: realtime player

In GUI mode do a realtime player: get frames and the sound from the target, do in-memory frames swap (may be with some buffering) or any other possible operation, and play the result. Also add quality presets: to improve processing speed, frames can be shrunken before processing, and restored in size after.
Also it may be possible to enable frame drop: render as many frames as possible, and just drop any other.

  • when wrong path is passed as source, gui produces an unhandled error
  • GUI won't start without a provided target
  • add quality control to change processing image scale on the fly. Also show last/median frame processing time
  • use pygame.display to show rendered frame as fast, as possible
  • implement pygame display resize stuff
  • Do not resize display on input frame change, resize the frame instead
  • combine with WebCam module, there's no need to multiply entities
  • fix/check behavior with image target
  • add a pause button to stop player only (continue background buffering) there's no need due the current design
  • add frame rotation controls
  • fix issue with long processing restart on position change
  • move bootstrap_frames() to a background thread, to not to block program. Get frames right from target, if they are not bootstrapped.
  • Remember and restore previous windows positions on the screen
  • Add "stay on top" control/option
  • Move frames extracting to the separate thread (make another queue). The experiment was unsuccessful: there's no profit to buffer all frames, because the next required frame index is unknown, while playing. The feature is postponed.
  • fix the issue with broken frame navigation while playing
  • need to try to implement new control to edit frame processors set and order
  • show gui progressbar on long operations (buffering, extracting, etc)
  • sound
  • documentation

Known issues:

  • The PygameFramePlayer may not refresh when showing frame while playing (e.g. shows two frames at same time)

More tests

  • State class needs tests
  • FaceAnalyser class needs tests

Check frames valildity after processing

Before create resulting video and deleting processed frames, it is need to check frames validity: if are there all frames in the sequence and all frames are not zero-sized.

Feature: gradio

tkinter sucks so everybody is migrating to gradio (original roop & roop unleashed)

your fork is the most OOP so migration should be better

Benchmark mode

Add the benchmark mode, which runs the every frame processor with every allowed execution provider, varying execution thread count. This mode should provide the user with the best variation of provider and threads for future usage.

Feature: intervals support

Add something like --intervals=(start1, stop2) (start1, stop2) , where start and stop are integer markers of frame intervals to process.

Fixme: get output path from processing modules

  1. BatchProcessingCore.suggest_output_path() doesn't applicable when path to a directory passed in --output-path.
  2. If no --output-path is provided, the output path is generated without source path reference.

BUG: temp folders names collisions

There is a collision in the case, when two different videos have the same source file. It produces the same temporary directory name for the FaceEnhancer processor. The solution: the target (not the source) file name should be used in that case somehow.

Originally posted by @pozitronik in #28 (comment)

Multithreading processing chain

In the current processing logic all frame processors runs in turns, one after one. The advantages of this logic are obvious: it is simple, there is a possibility to stop and resume processing at any time, e.t.c.
But there are disadvantages too, and they seems to be critical at this moment. There is no possibility to implement any realtime preview with this logic, and there is no place for speed enhancements, because we can't parallelize processing task.
So, there is a big (really big) task to implement multithreading there.

There will be named buffers, each buffer named to a linked processor. Each processor processing will start in the separate thread, it will read a frame from the input buffer, process it, and write to the output buffer (which is input buffer for the next processor).

There are subtasks for this task:

  • Implement the buffered processing chain.
  • Make *FrameBuffer classes for the different buffers types (memory, file, e.t.c).
  • Implement threads manager. Some processors may work faster than others, so their buffers will fill quickly. So it make a sense to pause those processors threads to redistribute computing powers to the other threads.
  • Implement states in this approach. Saving and loading buffers as a files can be useful: this logic partially coincides with the current state logic.
  • #68
  • Implement a memory control code. It is important, because several loaded models may require a lot of memory, and frame buffers are too.
  • #70
  • Show the general progress bar and the every processor progress bar and statistics (memory usage, buffers occupancy, e.t.c).
  • Properly handle in-threads exceptions.

ffprobe frame count detection can be really slow on some files

It seems that current implementation is really counting all frames, one by one (maybe, only when no appropriate value is in video header). It may lasts for minutes.
I need to find a way to get frames count in faster manner (may be use a binary search approach, as in C2VideoHandler).
Or i just can implement a workaround in VideoHandler: use detect_fc() from C2VideoHandler and result() from FFmpegVideoHandler.

Feature: config with defaults

Add support to config, that will store default parameters for every run, if those parameters are not overlapped in command line.

Refactor parameters handler

Every module should be able to handle it own set of command-line parameters. Those parameters should be gathered before program start. Every module also should be able to validate parameters by itself.

Move frame extraction routines to a processor

Instead using frame handler, when frames extraction is required, it is possible to create FrameExtractor processor and use it in the processing chain.
It allow to make code more clean.

Feature: multiple sources

Allow to pass multiple sources for processing, which should result to process target for the every source repeatedly.
In the GUI mode multiple sources may display as a list of images, and preview result should render for everyone of them, resulting to a list of previews.

Stop processing on errors

In some cases processing must be stopped, to prevent waste of power. As example, if no space left on device.

Refactoring: allow stateless processing

There are different usage scenarios, and most of them are stateless. In fact, states are required only within batch processing chain, but state object is heavily integrated into BaseFrameProcessor class. The same is applicable for data entries (source/input/output files/directories/etc).
So, i have to:

  1. Move out states from BaseFrameProcessor to some kind of top-level handler (I see it in rewritten Core).
  2. Make more general FrameProcessors to work only with frames, not files.

Different modules settings via config file

Implement support to read module setting from the current config file, like

[FaceSwapper]
execution-provider=cpu
threads=16

[FaceEnhancer]
execution-provider=cuda
threads=2

as addition to current global configurations (config file and command line)

Feature: different preview processors

It is possible to use different set of processors to make the quick preview (as example: halved image and no enhancing). The full processing can be done by request.

Feature: multiple targets

Check if it is possible to use multiple sources for swapping. Every source keeps a face with a certain position, and if the target face position is near to the face position,this face source should be used.

Rounded FPS

It seems that when source FPS is fractionary (e.g. like 29.97004) and target FPS is the same, any h264/h265 encoded videos have the same problem with broken rewind in the end part of the target video.
The solution is to round up FPS values by default.

GPU + CPU usage

It is possible to run some processing instances on CUDAExecutionProvider and some on CPUExecutionProvider. In theory, it can utilize GPU and CPU and increase overall speed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.