Git Product home page Git Product logo

dfdc_deepfake_challenge's Introduction

DeepFake Detection (DFDC) Solution by @selimsef

Challenge details:

Kaggle Challenge Page

Fake detection articles

Solution description

In general solution is based on frame-by-frame classification approach. Other complex things did not work so well on public leaderboard.

Face-Detector

MTCNN detector is chosen due to kernel time limits. It would be better to use S3FD detector as more precise and robust, but opensource Pytorch implementations don't have a license.

Input size for face detector was calculated for each video depending on video resolution.

  • 2x scale for videos with less than 300 pixels wider side
  • no rescale for videos with wider side between 300 and 1000
  • 0.5x scale for videos with wider side > 1000 pixels
  • 0.33x scale for videos with wider side > 1900 pixels

Input size

As soon as I discovered that EfficientNets significantly outperform other encoders I used only them in my solution. As I started with B4 I decided to use "native" size for that network (380x380). Due to memory costraints I did not increase input size even for B7 encoder.

Margin

When I generated crops for training I added 30% of face crop size from each side and used only this setting during the competition. See extract_crops.py for the details

Encoders

The winning encoder is current state-of-the-art model (EfficientNet B7) pretrained with ImageNet and noisy student Self-training with Noisy Student improves ImageNet classification

Averaging predictions

I used 32 frames for each video. For each model output instead of simple averaging I used the following heuristic which worked quite well on public leaderbord (0.25 -> 0.22 solo B5).

import numpy as np

def confident_strategy(pred, t=0.8):
    pred = np.array(pred)
    sz = len(pred)
    fakes = np.count_nonzero(pred > t)
    # 11 frames are detected as fakes with high probability
    if fakes > sz // 2.5 and fakes > 11:
        return np.mean(pred[pred > t])
    elif np.count_nonzero(pred < 0.2) > 0.9 * sz:
        return np.mean(pred[pred < 0.2])
    else:
        return np.mean(pred)

Augmentations

I used heavy augmentations by default. Albumentations library supports most of the augmentations out of the box. Only needed to add IsotropicResize augmentation.


def create_train_transforms(size=300):
    return Compose([
        ImageCompression(quality_lower=60, quality_upper=100, p=0.5),
        GaussNoise(p=0.1),
        GaussianBlur(blur_limit=3, p=0.05),
        HorizontalFlip(),
        OneOf([
            IsotropicResize(max_side=size, interpolation_down=cv2.INTER_AREA, interpolation_up=cv2.INTER_CUBIC),
            IsotropicResize(max_side=size, interpolation_down=cv2.INTER_AREA, interpolation_up=cv2.INTER_LINEAR),
            IsotropicResize(max_side=size, interpolation_down=cv2.INTER_LINEAR, interpolation_up=cv2.INTER_LINEAR),
        ], p=1),
        PadIfNeeded(min_height=size, min_width=size, border_mode=cv2.BORDER_CONSTANT),
        OneOf([RandomBrightnessContrast(), FancyPCA(), HueSaturationValue()], p=0.7),
        ToGray(p=0.2),
        ShiftScaleRotate(shift_limit=0.1, scale_limit=0.2, rotate_limit=10, border_mode=cv2.BORDER_CONSTANT, p=0.5),
    ]
    )

In addition to these augmentations I wanted to achieve better generalization with

augmentations

Building docker image

All libraries and enviroment is already configured with Dockerfile. It requires docker engine https://docs.docker.com/engine/install/ubuntu/ and nvidia docker in your system https://github.com/NVIDIA/nvidia-docker.

To build a docker image run docker build -t df .

Running docker

docker run --runtime=nvidia --ipc=host --rm --volume <DATA_ROOT>:/dataset -it df

Data preparation

Once DFDC dataset is downloaded all the scripts expect to have dfdc_train_xxx folders under data root directory.

Preprocessing is done in a single script preprocess_data.sh which requires dataset directory as first argument. It will execute the steps below:

1. Find face bboxes

To extract face bboxes I used facenet library, basically only MTCNN. python preprocessing/detect_original_faces.py --root-dir DATA_ROOT This script will detect faces in real videos and store them as jsons in DATA_ROOT/bboxes directory

2. Extract crops from videos

To extract image crops I used bboxes saved before. It will use bounding boxes from original videos for face videos as well. python preprocessing/extract_crops.py --root-dir DATA_ROOT --crops-dir crops This script will extract face crops from videos and save them in DATA_ROOT/crops directory

3. Generate landmarks

From the saved crops it is quite fast to process crops with MTCNN and extract landmarks
python preprocessing/generate_landmarks.py --root-dir DATA_ROOT This script will extract landmarks and save them in DATA_ROOT/landmarks directory

4. Generate diff SSIM masks

python preprocessing/generate_diffs.py --root-dir DATA_ROOT This script will extract SSIM difference masks between real and fake images and save them in DATA_ROOT/diffs directory

5. Generate folds

python preprocessing/generate_folds.py --root-dir DATA_ROOT --out folds.csv By default it will use 16 splits to have 0-2 folders as a holdout set. Though only 400 videos can be used for validation as well.

Training

Training 5 B7 models with different seeds is done in train.sh script.

During training checkpoints are saved for every epoch.

Hardware requirements

Mostly trained on devbox configuration with 4xTitan V, thanks to Nvidia and DSB2018 competition where I got these gpus https://www.kaggle.com/c/data-science-bowl-2018/

Overall training requires 4 GPUs with 12gb+ memory. Batch size needs to be adjusted for standard 1080Ti or 2080Ti graphic cards.

As I computed fake loss and real loss separately inside each batch, results might be better with larger batch size, for example on V100 gpus. Even though SyncBN is used larger batch on each GPU will lead to less noise as DFDC dataset has some fakes where face detector failed and face crops are not really fakes.

Plotting losses to select checkpoints

python plot_loss.py --log-file logs/<log file>

loss plot

Inference

Kernel is reproduced with predict_folder.py script.

Pretrained models

download_weights.sh script will download trained models to weights/ folder. They should be downloaded before building a docker image.

Ensemble inference is already preconfigured with predict_submission.sh bash script. It expects a directory with videos as first argument and an output csv file as second argument.

For example ./predict_submission.sh /mnt/datasets/deepfake/test_videos submission.csv

dfdc_deepfake_challenge's People

Contributors

lucky7323 avatar selimsef avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dfdc_deepfake_challenge's Issues

CUDNN_STATUS_NOT_INITIALIZED

I used Win11 WSL2+Docker. Have the same problem "CUDNN_STATUS_NOT_INITIALIZED".
image
Is something wrong with that?

Google Colab

Hello! I need to ask, Is it possible to run this code on Google Colab?

Does this detect face filters?

Hi! Thank you for your work. Does this detect face filters like you would see on instagram, for example how people use for beauty purposes, like to add makeup to the face, or change the face in general? Or does it strictly detect deepfakes?

MTCNN thresholds

In face_detector.py :
self.detector = MTCNN(margin=0,thresholds=[0.85, 0.95, 0.95], device=device)

but in kernel_utils.py :
self.detector = MTCNN(margin=0, thresholds=[0.7, 0.8, 0.8], device="cuda")

this is why ?

thank you

I encountered continuous target data is not supported with label binarization

I encountered this issue during validation

  File "finetune_xy.py", line 446, in <module>
    main()
  File "finetune_xy.py", line 303, in main
    summary_writer=summary_writer)
  File "finetune_xy.py", line 311, in evaluate_val
    bce, probs, targets = validate(model, data_loader=data_val)
  File "finetune_xy.py", line 366, in validate
    fake_loss = log_loss(y[fake_idx], x[fake_idx], labels=[0, 1])
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2206, in log_loss
    transformed_labels = lb.transform(y_true)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 491, in transform
    sparse_output=self.sparse_output)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 73, in inner_f
    return f(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sklearn/preprocessing/_label.py", line 680, in label_binarize
    "binarization" % y_type)
ValueError: continuous target data is not supported with label binarization
[1]+  Exit 1                  nohup python -u finetune_xy.py --config configs/b7.json > log.out

could you explain a little bit the

data_x = []
    data_y = []
    for vid, score in probs.items():
        score = np.array(score)
        lbl = targets[vid]

        score = np.mean(score)
        lbl = np.mean(lbl)
        data_x.append(score)
        data_y.append(lbl)
    y = np.array(data_y)
    x = np.array(data_x)
    fake_idx = y > 0.1
    real_idx = y < 0.1
    fake_loss = log_loss(y[fake_idx], x[fake_idx], labels=[0, 1])
    real_loss = log_loss(y[real_idx], x[real_idx], labels=[0, 1])
    print("{}fake_loss".format(prefix), fake_loss)
    print("{}real_loss".format(prefix), real_loss)

in your code? Thank you

submission.csv all predictions are below 0.5

Hi Selim,

Thank you for sharing your great work here, tried to use your predict_submission.sh to reproduce the submission.csv by using 7 efficientnet-b7 models on test_videos folder, but the prediction scores for all 400 videos are smaller than 0.5, most of them around 0.3-0.4, guess I did something wrong here, but just cannot figure out what could be the possible reasons, can help here?

Xu

generate_folds.py Line 106 KeyError

Hi, there.
I'm trying to preprocess your model, however, I had an issue running generate_folds.py
image
Here is the screenshot of the error.
Run it multiple times then it shows different video file names.
image
I looked up the metadata.json, I guess the error is from the training video have no original video.
It is means the assert videofold[video] will not equal to videofold[ori_vid].
I not should how to fix this.
Hope to hear from you soon.

Thanks,
Silion

could i download pretrained model about dfdc dataset? Now I get an error.

--2020-11-02 14:27:42-- https://github.com/selimsef/dfdc_deepfake_challenge/releases/download//final_999_DeepFakeClassifier_tf_efficientnet_b7_ns_0_23
Resolving github.com (github.com)... 15.164.81.167
Connecting to github.com (github.com)|15.164.81.167|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2020-11-02 14:27:42 ERROR 404: Not Found.

i encontered this issue. What can be done to solve this problem.

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

./train.sh path/to/data num-of-gpus
100%██████████████████████████████████████████████████████████████████████████████████| 2765/2765 [06:11<00:00, 7.45it/s]
Traceback (most recent call last):
File "training/pipelines/train_classifier.py", line 363, in
main()
File "training/pipelines/train_classifier.py", line 227, in main
summary_writer=summary_writer)
File "training/pipelines/train_classifier.py", line 235, in evaluate_val
bce, probs, targets = validate(model, data_loader=data_val)
File "training/pipelines/train_classifier.py", line 290, in validate
real_loss = log_loss(y[real_idx], x[real_idx], labels=[0, 1])
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 2186, in log_loss
y_pred = check_array(y_pred, ensure_2d=False)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py", line 653, in check_array
context))
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

Привет.

Как насчёт улучшать дипфейки, а не детектить их? :D

Resize the videos

Hi,

I just have a quick question. You mention that you resize the videos before the face detector.
Do we need to resize the videos before we run preprocess_data.sh?
Or preprocess_data.sh would also handle the resize of the videos as well when we run face detector?

I can not find the code you resize the image. This is the closest thing I find in your code.

frame = frame.resize(size=[s // 2 for s in frame.size])

Thank you!

Inference on CPU

Is there a way of using the trained weights & do inference using CPU only? My GPU can't handle inference with the current settings...

No module named 'VideoDataset'

When I ran python preprocessing/detect_original_faces.py --root-dir DATA_ROOT, I encountered an error:
Traceback (most recent call last):
File "preprocessing/detect_original_faces.py", line 14, in
import face_detector, VideoDataset
ModuleNotFoundError: No module named 'VideoDataset'
How can I solve this problem?

training dataset?

Thank you for your work. What training data do you use? All dfdc?Your model is basically unrecognizable on FOMM video. I want to add this batch of data to train your model.

Could you explain what does the argument "fold" mean

When I run generate_fold.py,I find the "fold" is alway "0"

                for k, v in metadata.items():
                    fold = None
                    for i, fold_dirs in enumerate(folds):
                        #if part in fold_dirs:
                            fold = i
                            break
                    assert fold is not None
                    video_id = k[:-4]
                    video_fold[video_id] = fold

I debug this part and find "fold” is alway "0" and then break
Since I can't understand what does "fold" mean,I don't know how to solve it and what is the correct
could you explain it in detail? thank you!

metadata.json

I have downloaded the DFDC dataset, but metadata.json wasn't found, can you share the download link of this file, thank you very much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.