manuelruder / fast-artistic-videos Goto Github PK

View Code? Open in Web Editor NEW

375.0 18.0 47.0 3.28 MB

Video style transfer using feed-forward networks.

License: Other

C++ 58.05% Makefile 0.02% Lua 29.04% Shell 4.82% Cuda 2.12% CMake 0.58% C 2.95% Python 2.42%

style-transfer

fast-artistic-videos's Introduction

fast-artistic-videos

This is the source code for fast video style transfer described in

Artistic style transfer for videos and spherical images
Manuel Ruder, Alexey Dosovitskiy, Thomas Brox

The paper builds on A Neural Algorithm of Artistic Style by Gatys et al. and Perceptual Losses for Real-Time Style Transfer and Super-Resolution by Johnson et al. and our code is based on Johnson's implementation fast-neural-style.

It is a successor of our previous work Artistic style transfer for videos and runs several orders of magnitudes faster.

Example videos:

Comparison between the optimization-based and feed-forward approach:

360° video:

If you find this code useful for your research, please cite

@Article{RDB18,
  author       = "M. Ruder and A. Dosovitskiy and T. Brox",
  title        = "Artistic style transfer for videos and spherical images",
  journal      = "International Journal of Computer Vision",
  month        = " ",
  year         = "2018",
  note         = "online first",
  url          = "http://lmb.informatik.uni-freiburg.de/Publications/2018/RDB18"
}

Setup
Running on new videos
Running on new spherical videos
Training new models
Contact
License

Setup

Disclaimer: Please note that this repository is no longer actively developed. Furthermore, the framework it uses, torch, is not maintained anymore and probably incompatible with most recent software environments. I have collected possible workarounds at the bottom of this chapter, however, there is neither a guarantee that this will work nor will there be any support from my side to get this code to run on recent environments.

First install Torch, then update / install the following packages:

luarocks install torch
luarocks install nn
luarocks install image
luarocks install lua-cjson
luarocks install hdf5

(Optional) GPU Acceleration

If you have an NVIDIA GPU, you can accelerate all operations with CUDA.

First install CUDA, then update / install the following packages:

luarocks install cutorch
luarocks install cunn

Also install stnbhwd (GPU accelerated warping) included in this repository:

cd stnbhwd
luarocks make stnbdhw-scm-1.rockspec

For CUDA version 9.0 and later, you must adapt the arch flag in CMakeLists.txt at line 55 to your GPU and CUDA version.

If you can not get stnbhwd to run but you want to use GPU acceleration at least for the stylization, remove all instances of require 'stn' from the code and edit the warp_image function in utilities.lua and remove everything in that function but line 147.

(Optional) cuDNN

When using CUDA, you can use cuDNN to accelerate convolutions and reduce memory footprint.

First download cuDNN and copy the libraries to /usr/local/cuda/lib64/. Then install the Torch bindings for cuDNN:

luarocks install cudnn

Workarounds for installing with recent Ubuntu / CUDA / cuDNN version

Some were able to fix erroneous results by downgrading torch, others by downgrading CUDA.

But what worked for me and also fixes some incompatibilities (Ubuntu 18.04, CUDA10, cuDNN7) was this fix.

Also for CUDA10, you need this fix.

And for Ubuntu 18.04 in order to install and use hdf5, you need this fix.

Optical flow estimator

Our algorithm needs an utility which estimates the optical flow between two images. Since our new stylization algorithm only needs a fraction of the time compared to the optimization-based approach, the optical flow estimator can become the bottleneck. Hence the choice of a fast optical flow estimator is crucial for near real time execution.

There are example scripts in our repository for either DeepFlow or FlowNet 2.0. DeepFlow is slower but comes as a standalone executable and is therefore very easy to install. Faster execution times can be reached with FlowNet 2.0 which runs on a GPU as well, given you have a sufficient fast GPU. FlowNet 2.0 was used for the experiments in our paper.

DeepFlow setup instructions

Just download both DeepFlow and DeepMatching (CPU version) and place the static binaries (deepmatching-static and deepflow2-static) in the root directory of this repository.

FlowNet 2.0 setup instructions

Go to flownet2 (GitHub) and follow the instructions there on how to download, compile and use the source code and pretrained models. Since FlowNet is build upon Caffe, you may also want to read Caffe | Installation for a list of dependencies. There is also a Dockerfile for easy installation of the complete code in one step: flownet2-docker (GitHub)

Then edit run-flownet-multiple.sh and set the paths to the FlowNet executable, model definition and pretrained weights.

If you have struggles installing Caffe, there is also a TensorFlow implementation: FlowNet2 (TensorFlow). However, you will have to adapt the scripts in this repository accordingly.

Please don't ask me for support installing FlowNet 2.0. Ask the original authors or use DeepFlow.

Pretrained Models

Download all pretrained video style transfer models by running the script

bash models/download_models.sh

This will download 6 video model and 6 image model files (~300MB) to the folder models/.

You can download pretrained spherical video models with download_models_vr.sh, it will download 2 models (~340MB). These models are larger because they have more filters. We later found that less filters can archive similar performance, but didn't retrain the spherical video models.

Running on new videos

Example script

You can use the scripts stylizeVideo_*.sh <path_to_video> <path_to_video_model> [<path_to_image_model>] to easily stylize videos using pretrained models. Choose one of the optical flow methods and specify one of the models we provide, see above. If image model is not specified, it will use the video model to generate the first frame (by marking everything as occluded). It will do all the preprocessing steps for you. For longer videos, make sure to have enough disk space available. This script will extract the video into uncompressed image files.

Advanced usage

For advances users, videos can be stylized with fast_artistic_videos.lua.

You must specify the following options:

model_vid Path to a pretrained video model.
model_img Path to a separate pretrained image model which will be used to stylize the first frame, or 'self', if the video model should be used for this (the whole scene will be marked as uncertain in this case)
input_pattern File path pattern of the input frames, e.g. video/frame_%04d.png
flow_pattern A file path pattern for files that store the backward flow between the frames. The placeholder in square brackets refers to the frame position where the optical flow starts and the placeholder in braces refers to the frame index where the optical flow points to. For example flow_[%02d]_{%02d}.flo means the flow files are named flow_02_01.flo, flow_03_02.flo, etc. If you use the script included in this repository (makeOptFlow.sh), the filename pattern will be backward_[%d]_{%d}.flo.
occlusions_pattern A file path pattern for the occlusion maps between two frames. These files should be a grey scale image where a white pixel indicates a high flow weight and a black pixel a low weight, respective. Same format as above. If you use the script, the filename pattern will be reliable_[%d]_{%d}.pgm.
output_prefix File path prefix of the output, e.g. stylized/out. Files will then be named stylized/out-00001.png etc.

By default, this script runs on CPU; to run on GPU, add the flag -gpu specifying the GPU on which to run.

Other useful options:

occlusions_min_filter: Width of a min filter applied to the occlusion map, can help to remove artifacts around occlusions. (Default: 7)
median_filter: Width of a median filter applied to the output. Can reduce network noise. (Default: 3)
continue_with: Continue with the given frame index, if the previous frames are already stylized and available at the output location.
num_frames: Maximum number of frames to process. (Default: 9999)
backward: Process in backward direction, from the last frame to the first one.

To use this script for evaluation, specify -evaluate and give the following options:

flow_pattern_eval: Ground truth optical flow.
occlusions_pattern_eval: Ground truth occlusion maps.
backward_eval: 'Perform evaluation in backward direction. Useful if only forward flow is available as it is the case for the Sintel dataset.
evaluation_file: File to write the results in.
loss_network: Path to a pretrained network used to compute style and content similarity, e.g. VGG-16.
content_layers: Content layer indices to compute content similarity.
style_layers: Style layer indices.
style_image: Style image to be used for evaluation.
style_image_size

Running on new spherical videos

To stylize spherical videos, frames must be present as cube map projections with overlapping borders. Most commonly, however, spherical videos are encoded as equirectangular projection. Therefore, a reporjection becomes necessary.

Reprojection software

Transform360 can do the necessary transformations. To install, follow the instruction in their repository.

Example script

Given a successful Transform360 compilation and a vr video in equirectangular projection (most common format), you can use the script stylizeVRVideo_[deepflow|flownet].sh <path_to_equirectangular_projection_video> <path_to_pretrained_vr_model>. Make sure to place the ffmpeg binary dropped by Transform360 in the root directory of this repository. As above, also make sure to have enough disk space available for longer videos.

Advanced usage

See the example scripts above for a preprocessing pipeline. Each cube face must be stored in a separate file.

fast_artistic_videos_vr.lua has similar options than the video script with the following differences:

The arguments given for input_pattern, flow_pattern and occlusions_pattern must have another placeholder for the cube face id, for example frame_%05d-%d.ppm and backward_[%d]_{%d}-%d.flo.
overlap_pixel_w: Horizontal overlapping region.
overlap_pixel_h: Vertical overlapping region.
out_cubemap: Whether the individual cube faces should be combined to one file.
out_equi: Whether an additional equirectangular projection should be created from the output. Increases processing time. If this option is present, the size of the projection can be specified with out_equi_w and out_equi_h.
create_inconsistent_border: No border consistency (for benchmarking purposes).

Training new models

Training a new model is complicated and requires a lot of preparation steps. Only recommended for advanced users.

Prerequisites

Note that you can omit some of these steps depending on the training parameters (see below). If you aim to reproduce the results in our paper, all steps are necessary though.

First, you need to prepare a video dataset consisting of videos from the hollywood2 dataset. This requires a lot of free hard drive capacity (>200 GB).

Prepare a python environment with h5py, numpy, Pillow, scipy, six installed (see this requirements.txt)
Visit HOLLYWOOD2, download Scene samples (25Gb) and extract the files to your hard drive.
Run video_dataset/make_flow_list.py <folder_to_extracted_dataset> <output_folder> [<num_tuples_per_scene> [<num_frames_per_tuple>]]. This script will extract <num_tuples_per_scene> tuples consisting of <num_frames_per_tuple> consecutive frames from each scene of the hollywood2 dataset by amount of motion in the scene and create a file called flowlist.txt in the output folder. The default is num_tuples_per_scene=5 (I recommend to recude this if you just want to get good results but don't aim to exactly reproduce the results in our paper. A lower number will reduce dataset size and save some optical flow computations which could take quite long on an older computer), and num_frames_per_tuple=5 (needed for multi-frame traning, otherwise set to 2).
Compute optical flow for all frame pairs listed in flowlist.txt. This file also contains the output paths and is directly compatible to the flownet2 script run-flownet-many.py which expects a listfile as input.
Compute occlusions from the forward and backward flow using the script bash video_dataset/make_occlusions.sh <output_folder>, where <output_folder> should be identical to <output_folder> in step 3.
Run video_dataset/make_video_dataset.py --input_dir <path> --sequence_length <n>, where <path> should be identical to <output_folder> and <n> to <num_frames_per_tuple> in step 3.

Secondly, to make use of the mixed training strategy, the spherical video training or the additional training data from simulated camera movement on single images, you also need to prepare a single image dataset as described by Johnson et al.. You may want to change image size to 384x384, since the algorithm takes multiple smaller crops per image and resizes them to 256x256.

Thirdly, you have to download the loss network from here and place it somewhere.

Fourthly, create a single image style transfer model as described by Johnson et al. (you can also use a pre-trained model if it has the same style). Please remember settings for style and content weight, style image size and other parameters that change the appearance of the stylized image. Then use the same parameters for the video net. Different parameters may cause unwanted results.

Training parameters

Now, you can start a training using train_video.lua with the following main arguments:

-h5_file: Path to a file containing single images (e.g. from MS COCO, created as described in fast-neural-style). Required if you use any data source other than video in the data_mix argument.
-h5_file_video: Path to a file containing video frames and optical flow. (As created above). Required if you use the video data source in the data_mix argument.
-image_model: Path to a pretrained image model to generate the first frame. (or self, if the first frame should be generated by the video network pretending an all-occluded empty prior image. You may also set single_image_unti then.)
-style_image: Path to the style image.
-loss_network: Path to a .t7 file containing a pretrained CNN to be used as a loss network. The default is VGG-16, but the code should support many models such as VGG-19 and ResNets.

Besides that, the following optional arguments can be modified to customize the result:

Training data options:

-data_mix: What kind of training samples the network should use and how they are weighted. Weights are integer numbers and specify the probability of sampling during training. (Each number is divided by the sum of all numbers to get a proper probability distribution) Can be any of the following; multiple values are separated by a comma and the weight of each source is appended by a colon.
- single_image: Present an empty all-occluded prior image as the first frame and let the network stylize the image from scratch (corresponding to the mixed training strategy in the paper)
- shift: Take a single image and shift image to simulate camera movement.
- zoom_out: Same as above but with a zoom-out effect.
- video: Actual video frames from the video dataset.
- vr: Pseuo-warped images to simulate spherical video input (in the cube face format). Example:shift:1,zoom_out:1,video:3. Then, shift, zoom_out and video are sampled with probability 1/5, 1/5 and 3/5, respectively.
-num_frame_steps How many succesive frames to use per sample in a pseudo-reccursive manner (as described by our paper by the multi-frame training) as a function of the iteration. Multiple data points are separated by a comma. E.g. 0:1,50000:2 means that one succesive frame is used at the beginning and two succesive frames are used starting with iteration 50000. (Default: 0:1)
-single_image_until: Use only the single_image data source until the given iteration. (default: 0)
-reliable_map_min_filter: Width of minimum filter applied to the reliable map such that artefacts near motion boundaries are removed. (Default: 7)

Model options:

-arch: String specifying the architecture to use. Architectures are specified as comma-separated strings. The architecture used in the original paper by Johnson et al. is c9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3. However, we achieved better results with c9s1-32,d64,d128,R128,R128,R128,R128,R128,U2,c3s1-64,U2,c9s1-3. All internal convolutional layers are followed by a ReLU and either batch normalization or instance normalization.
- cXsY-Z: A convolutional layer with a kernel size of X, a stride of Y, and Z filters.
- dX: A downsampling convolutional layer with X filters, 3x3 kernels, and stride 2.
- RX: A residual block with two convolutional layers and X filters per layer.
- uX: An upsampling convolutional layer with X filters, 3x3 kernels, and stride 1/2.
- UX: A nearest neighbor upsampling layer with an upsampling factor of X. Avoids checkerboard pattern compared to upsampling conv as described here.
-use_instance_norm: 1 to use instance normalization or 0 to use batch normalization. Default is 1.
-padding_type: What type of padding to use for convolutions in residual blocks. The following choices are available:
- zero: Normal zero padding everywhere.
- reflect: Spatial reflection padding for all convolutions in residual blocks.
- replicate: Spatial replication padding for all convolutions in residual blocks.
- reflect-start (default): Spatial reflection padding at the beginning of the model and no padding for convolutions in residual blocks.
-tanh_constant: There is a tanh nonlinearity after the final convolutional layer; this puts outputs in the range [-1, 1]. Outputs are then multiplied by the -tanh_constant so the outputs are in a more standard image range.
-preprocessing: What type of preprocessing and deprocessing to use; either vgg or resnet. Default is vgg. If you want to use a ResNet as loss network you should set this to resnet.
-resume_from_checkpoint: Path to a .t7 checkpoint created by train_video.lua to initialize the model from. If you use this option then all other model architecture options will be ignored. Note that this will not restore the optimizer state, so that this option is mainly useful for finetuning with different input data.

Optimization options:

-pixel_loss_weight: Weight to use for the temporal consistency loss. Note: if you use the mixed training strategy, this weight must be increased proportional to the number of single_image samples. (Default: 50)
-content_weights: Weight to use for each content reconstruction loss. (Default: 1)
-content_layers: Which layers of the loss network to use for the content reconstruction loss. This will usually be a comma-separated list of integers, but for complicated loss networks like ResNets it can be a list of layer strings.
-style_weights: Weight to use for style reconstruction loss. Reasonable values are between 5 and 20, dependent on the style image and your preference. (Default: 10)
-style_image_size: Before computing the style loss targets, the style image will be resized so its smaller side is this many pixels long. This can have a big effect on the types of features transferred from the style image.
-style_layers: Which layers of the loss network to use for the style reconstruction loss. This is a comma-separated list of the same format as -content_layers.
-tv_strength: Strength for total variation regularization on the output of the transformation network. Default is 1e-6; higher values encourage the network to produce outputs that are spatially smooth.
-num_iterations: Total number of iterations. (default: 60000)
-batch_size: Batch-size. Since we use instance normalization, smaller batch sized can be used without substantial degeneration. (default: 4)
-learning_rate (default: 1e-3)

Checkpointing:

-checkpoint_every: Every checkpoint_every iterations, check performance on the validation set and save both a .t7 model checkpoint and a .json checkpoint with loss history.
-checkpoint_name: Path where checkpoints are saved. Default is checkpoint, meaning that every -checkpoint_every iterations we will write files checkpoint.t7 and checkpoint.json.
-images_every: Save current input images, occlusion mask and network output every images_every iterations in a folder named debug. Useful to see what the network has already learned and to detect errors that lead to degeneration. (default: 100)

Backend:

-gpu: Which GPU to use; default is 0. Set this to -1 to train in CPU mode.
-backend: Which backend to use for GPU, either cuda or opencl.
-use_cudnn: Whether to use cuDNN when using CUDA; 0 for no and 1 for yes.

Training parameters for the results in our paper

Simple training (baseline):

th train_video.lua -data_mix video:3,shift:1,zoom_out:1 -num_frame_steps 0:1 -num_iterations 60000 -pixel_loss_weight 50 -arch c9s1-32,d64,d128,R128,R128,R128,R128,R128,U2,c3s1-64,U2,c9s1-3

Mixed training:

th train_video.lua -data_mix video:3,shift:1,zoom_out:1,single_image:5 -num_frame_steps 0:1 -num_iterations 60000 -pixel_loss_weight 100 -arch c9s1-32,d64,d128,R128,R128,R128,R128,R128,U2,c3s1-64,U2,c9s1-3

Multi-frame, mixed training:

th train_video.lua -data_mix video:3,shift:1,zoom_out:1,single_image:5 -num_frame_steps 0:1,50000:2,60000:4 -num_iterations 90000 -pixel_loss_weight 100 -arch c9s1-32,d64,d128,R128,R128,R128,R128,R128,U2,c3s1-64,U2,c9s1-3

Spherical videos:

First, train a video model of any kind.

Then, finetune on spherical images:

th train_video.lua -resume_from_checkpoint <checkpoint_path> --data_mix ...,vr:<n> -num_iterations <iter>+30000 -checkpoint_name ..._vr

where you have to replace <n> such that vr is presented exactly half of the time (e.g. 5 for simple training, 10 for multi-frame) and <iter>+30000 with 30000 added to the number of iterations of the previous model (i.e. we finetune for 30000 iterations), and use otherwise the same parameters as the video model. However, to avoid that the video model gets overwritten, change parameter checkpoint_name.

Contact

For issues or questions related to this implementation, please use the issue tracker. For everything else, including licensing issues, please email us. Our contact details can be found in our paper.

License

Free for personal or research use; for commercial use please contact us. Since our algorithm is based on Johnson's implementation, see also fast-neural-style #License.

fast-artistic-videos's People

Contributors

Stargazers

Watchers

fast-artistic-videos's Issues

Can't launch it... [Windows subsystem for Linux]

Hi!
First of all, I'm sorry to ask help for something as basic as this!
I'm on windows, and I've install all dependencies needed, in order to make this project work, but my lack of knowledge on shell make it really hard for me to understand what goes wrong.

I tried all sort of commands variations on my command prompt, but each time, the result is :

styleVideo_*.sh: ${filename//[%]/x}: bad substitution

Usually, the command I enter is something like this :

sh stylizeVideo_deepflow.sh ./resources/input.mp4 ./models/checkpoint-mosaic-video.t7

This syntaxe is really terrifying for me, I don't understand what did I do wrong, and I my internet research didn't helped me...
I tried to edit the code, and assign myself the value of filename, but other errors comes up. I really have problems with the shell :/

I hope you can help me, really want to test your amazing work!
Thank you! :)

Spherical Video Training

Hi Manuel, I ran into an issue regarding training for spherical videos. Your instructions say: first train a video model and then finetune it. I attempted this with several trained video models that I have but I kept getting this error (relating to the data_mix parameter):

I tried different variations for the data_mix parameter but no matter what.. that "vr:" was giving me issues. Any thoughts on the matter?

Thank you for your help in advance!

./consistencyChecker/consistencyChecker: No such file or directory

Can anybody help me here? Would be really happy, as I have been desperately trying to get this great tool running...
I'm on Ubuntu 16.04 LTS.

Trying

bash stylizeVideo_deepflow.sh ../videos/short.mp4 ./models/checkpoint-mosaic-image.t7

gave me the only error message that keeps coming every 10s:
makeOptFlow_deepflow.sh: line 59: ./consistencyChecker/consistencyChecker: No such file or directory makeOptFlow_deepflow.sh: line 60: ./consistencyChecker/consistencyChecker: No such file or directory makeOptFlow_deepflow.sh: line 59: ./consistencyChecker/consistencyChecker: No such file or directory makeOptFlow_deepflow.sh: line 60: ./consistencyChecker/consistencyChecker: No such file or directory makeOptFlow_deepflow.sh: line 59: ./consistencyChecker/consistencyChecker: No such file or directory makeOptFlow_deepflow.sh: line 60: ./consistencyChecker/consistencyChecker: No such file or directory ...
Why is this happening although the consistencyChecker does exist at that path, and how can I stop the recurring error message (CRL+C doesn't help)?

Basic help :) custom style image?

Hi, I would love to use your system to transfer a custom style image we have onto a video. However I do not see any clear way (sorry newbie) to take the video and style to convert .. do I have to train something first.. anyone care to please help out the steps. I see them in the readme but not sure how I can apply my custom style.?

Training new model error with make_video_dataset.py

I was able to run everything successfully on my Ubuntu 16.04 machine using CUDA 8 and CuDNN 5. However, I ran into a strange error when training a new model.

I followed your steps and successfully finishing steps 1-5. When I reach the sixth step:

• Run video_dataset/make_video_dataset.py --input_dir --sequence_length , where should be identical to <output_folder> and to <num_tuples_per_scene> in step 3.

I get the following error:

(env) ➜ fast-artistic-videos git:(master) ✗ python video_dataset/make_video_dataset.py --input_dir /home/paperspace/fast-artistic-videos/video_dataset/output_hollywood --sequence_length 5 

Found 41023 images 
Exception in thread Thread-2: Traceback (most recent call last): 
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner 
  self.run() 
File "/usr/lib/python2.7/threading.py", line 754, in run 
  self.__target(*self.__args, **self.__kwargs) 
File "video_dataset/make_video_dataset.py", line 99, in read_worker 
  certs.append(imread(cert_path)) 
File "/home/paperspace/fast-artistic-videos/env/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 154, in imread 
  im = Image.open(name) 
File "/home/paperspace/fast-artistic-videos/env/local/lib/python2.7/site-packages/PIL/Image.py", line 2280, in open 
  fp = builtins.open(filename, "rb") 
IOError: [Errno 2] No such file or directory: '/home/paperspace/fast-artistic-videos/video_dataset/output_hollywood/scenecliptest00086/flow/reliable_4495_4494.pgm' 

Exception in thread Thread-1: Traceback (most recent call last): 
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner 
  self.run() 
File "/usr/lib/python2.7/threading.py", line 754, in run
  self.__target(*self.__args, **self.__kwargs) 
File "video_dataset/make_video_dataset.py", line 99, in read_worker 
  certs.append(imread(cert_path)) 
File "/home/paperspace/fast-artistic-videos/env/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 154, in imread 
  im = Image.open(name) 
File "/home/paperspace/fast-artistic-videos/env/local/lib/python2.7/site-packages/PIL/Image.py", line 2280, in open 
  fp = builtins.open(filename, "rb") 
IOError: [Errno 2] No such file or directory: '/home/paperspace/fast-artistic-videos/video_dataset/output_hollywood/scenecliptest00579/flow/reliable_3900_3899.pgm'

"Exception in Threading... No such file or directory...etc" every time I run this command. Each time, it points to a different folder with missing files.

When I check the folders specified for these missing files, I find them to be truly missing and I'm not sure where I went wrong. I followed your steps exactly as you say and I received no errors for computing optical flow and occlusions. For the first step, I used the default <num_tuples_per_scene> and <num_frames_per_tuble> which is why in step 6, I make the sequence_length 5.

Any ideas/thoughts? I would incredibly appreciate it.

stylizeVideo_deepflow.sh crashes calling BilinearSamplerBDHW field

On Ubuntu 16.04 LTS, trying to stylize a video called "01rudy.mp4" by:

bash stylizeVideo_deepflow.sh ../videos/01rudy.mp4 ./models/checkpoint-mosaic-video.t7

This works for generating the frame*.ppm files, the flow_default directory with the forward_.flo, backward_.flo, reliable_*.flo files.
However, the script crashes when generating the first png file names out-00001.png.
Why?

Here's the stack output:

...
Starting optical flow computation as a background task...
Starting video stylization...
Model loaded.
Elapsed time for stylizing frame independently:0.524191
Writing output image to 01rudy/out-00001.png
Waiting for file "01rudy/flow_default/reliable_2_1.pgm"

/home/rstudio/torch/install/bin/luajit: ./fast_artistic_video/utils.lua:143: attempt to call field 'BilinearSamplerBDHW' (a nil value)
stack traceback:
	./fast_artistic_video/utils.lua:143: in function 'warp_image'
	fast_artistic_video.lua:156: in function 'func_make_last_frame_warped'
	./fast_artistic_video_core.lua:162: in function 'run_next_image'
	./fast_artistic_video_core.lua:208: in function 'run_fast_neural_video'
	fast_artistic_video.lua:187: in function 'main'
	fast_artistic_video.lua:192: in main chunk
	[C]: in function 'dofile'
	...udio/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

make_video_dataset.py Exception in thread Thread-3

hi,

I've successively cut up the dataset and computed optical flow + occlusion
and am currently trying to make the h5 file.

I started running :

python video_dataset/make_video_dataset.py --input_dir /mnt/08E0B3C06558415B/neural/hollywood_flowlist/ --sequence_length 1

and got a few issues with the "queue" model that couldn't be found so I added this to the code :

try: 
    import queue as Queue
    import queue as queue
except ImportError:
    import Queue as Queue
    import Queue as queue

and then got TypeErrors of the type :

Found 9291 images
Traceback (most recent call last):
  File "video_dataset/make_video_dataset.py", line 160, in <module>
    add_data(f, args.input_dir, 'train', args)
  File "video_dataset/make_video_dataset.py", line 87, in add_data
    input_queue = Queue()
TypeError: 'module' object is not callable

that I tried fixing by adding "queue." to every call of Queue()

  # input_queue stores (idx, filename) tuples,
  # output_queue stores (idx, resized_img) tuples
  input_queue = queue.Queue()
  output_queue = queue.Queue()

But now, wether using python2.7 or python 3.5 to run the script,
I get the following output which is similar to another issue #12 for which the solution did not work for me

Found 9291 images
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "video_dataset/make_video_dataset.py", line 121, in write_worker
    flow_dset[idx,i] = flow.transpose(2, 0, 1)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/kaspar/.local/lib/python3.5/site-packages/h5py/_hl/dataset.py", line 609, in __setitem__
    selection = sel.select(self.shape, args, dsid=self.id)
  File "/home/kaspar/.local/lib/python3.5/site-packages/h5py/_hl/selections.py", line 94, in select
    sel[args]
  File "/home/kaspar/.local/lib/python3.5/site-packages/h5py/_hl/selections.py", line 261, in __getitem__
    start, count, step, scalar = _handle_simple(self.shape,args)
  File "/home/kaspar/.local/lib/python3.5/site-packages/h5py/_hl/selections.py", line 451, in _handle_simple
    x,y,z = _translate_int(int(arg), length)
  File "/home/kaspar/.local/lib/python3.5/site-packages/h5py/_hl/selections.py", line 471, in _translate_int
    raise ValueError("Index (%s) out of range (0-%s)" % (exp, length-1))
ValueError: Index (0) out of range (0--1)

Thank you for your help :)

finetuning on spherical vid - inconsistent tensor sizes

Hi Manuel,
I'm running into a recurrent issue while finetuning for spherical videos
after a bit of training, during the run on validation set
I get an inconsistent tensor sizes error.
Here is the traceback :

Iteration 94000 / 120000, loss = 575304.800131 Running on validation set ... /home/kaspar/torch/install/bin/luajit: inconsistent tensor sizes at /tmp/luarocks_cutorch-scm-1-6847/cutorch/lib/THC/generic/THCTensorMath.cu:157 stack traceback: [C]: at 0x7f6350def6e0 [C]: in function 'cat' train_video.lua:466: in function 'main' train_video.lua:557: in main chunk [C]: in function 'dofile' ...spar/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

and here is my command :

th train_video.lua \ -resume_from_checkpoint ~/fast-artistic-videos/models/monark.t7 \ -h5_file /media/kaspar/neural/coco2014/image_dataset.h5 \ -h5_file_video /media/kaspar/neural/video_dataset.h5 \ -image_model self \ -style_image $1 \ -loss_network /media/kaspar/neural/loss/vgg16.t7 \ -data_mix video:3,shift:1,zoom_out:1,single_image:5,vr:10 \ -num_frame_steps 0:1,50000:2,60000:4 \ -num_iterations 120000 \ -pixel_loss_weight 100 \ -style_image_size 900 \ -arch c9s1-32,d64,d128,R128,R128,R128,R128,R128,U2,c3s1-64,U2,c9s1-3 \ -gpu 0 \ -backend cuda \ -use_cudnn 1 \ -checkpoint_every 2000 \ -checkpoint_name ~/fast-artistic-videos/models/monark_vr

I think I followed everything right, but still can't quite grasp the issue
thanks for your help :)))

What programming language is this?

I would really like to try using this for therapeutic purposes but I have no experience programming. Can someone please tell me what programming language it is so I can get an expert to assist me? Thank you - I am referring to the instructions in following link:

https://github.com/manuelruder/fast-artistic-videos#running-on-new-videos

stylizeVideo_flownet.sh cannot find FlowNet2_deploy.prototxt

I got Caffe for flownet2 running on Ubuntu 16.04.
Running
bash stylizeVideo_flownet.sh zoo1.mp4 ./models/checkpoint-mosaic-video.t7
stops due to unfound caffemodel in the flownet2 folder:

Starting optical flow computation...
Traceback (most recent call last):
    if(not os.path.exists(args.deployproto)): raise BaseException('deploy-proto does not exist: '+args.deployproto)
BaseException: deploy-proto does not exist: /home/rstudio/flownet2/Flownet2/FlowNet2_deploy.prototxt

Looking in that directory, turns out there is not a FlowNet2_deploy.prototxt but FlowNet2_deploy.prototxt.template file. Copying the FlowNet2_deploy.prototxt.template to FlowNet2_deploy.prototxt still returns the same error message:

Starting optical flow computation...
Traceback (most recent call last):
  File "/home/rstudio/flownet2/scripts/run-flownet-many.py", line 22, in <module>
    if(not os.path.exists(args.deployproto)): raise BaseException('deploy-proto does not exist: '+args.deployproto)
BaseException: deploy-proto does not exist: /home/rstudio/flownet2/Flownet2/FlowNet2_deploy.prototxt

As before, the file does exist at precisely the path (see last line):

/flownet2/FlowNet2$ ls -al
total 638820
drwxr-xr-x  2 rstudio rstudio      4096 Apr  5 14:05 .
drwxrwxr-x 33 rstudio rstudio      4096 Apr  5 14:05 ..
-rw-rw-r--  1 rstudio rstudio     62798 Apr  5 13:58 FlowNet2_deploy.prototxt
-rw-r--r--  1 rstudio rstudio     62798 Apr 25  2017 FlowNet2_deploy.prototxt.template
-rw-rw-r--  1 rstudio rstudio     69448 Apr  5 14:05 FlowNet2_train.prototxt
-rw-r--r--  1 rstudio rstudio     69448 Apr 25  2017 FlowNet2_train.prototxt.template
-rw-r--r--  1 rstudio rstudio 653868648 Apr 25  2017 FlowNet2_weights.caffemodel.h5
rstudio@demo3:~/flownet2/FlowNet2$ pwd
/home/rstudio/flownet2/FlowNet2

So the questions are:

Why does the script run-flownet-multiple.sh expect a FlowNet2_deploy.prototxt and not the FlowNet2_deploy.prototxt.template? Any expection from that file that could be made explicit?
Why is the file (in both cases, .prototxt and .prototxt.template) not found even thought they do exist?

stylizeVideo_deepflow.sh produces video which is not stylized

stylizeVideo_deepflow.sh runs smoothly without error message, i.e. produces out*.png files and the .mp4 file.
However, the video displays is not stylized and it mainly a single color (here red) but the original video displayed people. What is wrong? Is it due to deepflow?

Here are the first three frames for an impression:

Here's the stack:

bash stylizeVideo_deepflow.sh input/dance1.mov ./models/checkpoint-mosaic-video.t7

In case of multiple GPUs, enter the zero-indexed ID of the GPU to use here, or enter -1 for CPU mode (slow!). [0]
 >

Which backend do you want to use?   For Nvidia GPUs it is recommended to use cudnn if installed. If not, use nn.   For non-Nvidia GPU, use opencl (not tested). Note: You have to have the given backend installed in order to use it. [cudnn]
 >

Please enter a resolution at which the video should be processed, in the format w:h, or leave blank to use the original resolution. If you run out of memory, reduce the resolution.
 > 77:128

Please enter a downsampling factor (on a log scale, integer) for the matching algorithm used by DeepFlow. If you run out of main memory or optical flow estimation is too slow, slightly increase this value, otherwise the default value will be fine. [2]
 >
ffmpeg version 3.4.2-1+b1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7 (Debian 7.3.0-4)
  configuration: --prefix=/usr --extra-version=1+b1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input/dance1.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 0
    compatible_brands: qt
    creation_time   : 2016-02-14T23:45:54.000000Z
    com.apple.quicktime.location.ISO6709: +38.7090-009.1443+005.006/
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone 6 Plus
    com.apple.quicktime.software: 9.3
    com.apple.quicktime.creationdate: 2016-02-14T23:45:53+0000
  Duration: 00:00:11.59, start: 0.000000, bitrate: 10879 kb/s
    Stream #0:0(und): Video: h264 (Baseline) (avc1 / 0x31637661), yuv420p(tv, bt709), 1280x720, 10752 kb/s, 30.03 fps, 30 tbr, 600 tbn, 1200 tbc (default)
    Metadata:
      rotate          : 90
      creation_time   : 2016-02-14T23:45:54.000000Z
      handler_name    : Core Media Data Handler
      encoder         : H.264
    Side data:
      displaymatrix: rotation of -90.00 degrees
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 85 kb/s (default)
    Metadata:
      creation_time   : 2016-02-14T23:45:54.000000Z
      handler_name    : Core Media Data Handler
    Stream #0:2(und): Data: none (mebx / 0x7862656D), 32 kb/s (default)
    Metadata:
      creation_time   : 2016-02-14T23:45:54.000000Z
      handler_name    : Core Media Data Handler
    Stream #0:3(und): Data: none (mebx / 0x7862656D), 0 kb/s (default)
    Metadata:
      creation_time   : 2016-02-14T23:45:54.000000Z
      handler_name    : Core Media Data Handler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> ppm (native))
Press [q] to stop, [?] for help
Output #0, image2, to 'dance1/frame_%05d.ppm':
  Metadata:
    major_brand     : qt
    minor_version   : 0
    compatible_brands: qt
    com.apple.quicktime.creationdate: 2016-02-14T23:45:53+0000
    com.apple.quicktime.location.ISO6709: +38.7090-009.1443+005.006/
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone 6 Plus
    com.apple.quicktime.software: 9.3
    encoder         : Lavf57.83.100
    Stream #0:0(und): Video: ppm, rgb24, 77x128, q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc (default)
    Metadata:
      encoder         : Lavc57.107.100 ppm
      creation_time   : 2016-02-14T23:45:54.000000Z
      handler_name    : Core Media Data Handler
    Side data:
      displaymatrix: rotation of -0.00 degrees
frame=  348 fps= 81 q=-0.0 Lsize=N/A time=00:00:11.60 bitrate=N/A speed= 2.7x
video:10053kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Starting optical flow computation as a background task...
Starting video stylization...
Model loaded.
Elapsed time for stylizing frame independently:0.333889
Writing output image to dance1/out-00001.png
Waiting for file "dance1/flow_77:128/reliable_2_1.pgm"
Elapsed time for stylizing frame:0.023393
Writing output image to dance1/out-00002.png
Waiting for file "dance1/flow_77:128/reliable_3_2.pgm"
Elapsed time for stylizing frame:0.019732
Writing output image to dance1/out-00003.png
Waiting for file "dance1/flow_77:128/reliable_4_3.pgm"
Elapsed time for stylizing frame:0.023397
Writing output image to dance1/out-00004.png
Waiting for file "dance1/flow_77:128/reliable_5_4.pgm"
Elapsed time for stylizing frame:0.023525999999999
Writing output image to dance1/out-00005.png
Waiting for file "dance1/flow_77:128/reliable_6_5.pgm"
Elapsed time for stylizing frame:0.015667
Writing output image to dance1/out-00006.png
Waiting for file "dance1/flow_77:128/reliable_7_6.pgm"
Elapsed time for stylizing frame:0.026106
Writing output image to dance1/out-00007.png
Waiting for file "dance1/flow_77:128/reliable_8_7.pgm"
Elapsed time for stylizing frame:0.013540000000001
Writing output image to dance1/out-00008.png
Waiting for file "dance1/flow_77:128/reliable_9_8.pgm"
Elapsed time for stylizing frame:0.023158
Writing output image to dance1/out-00009.png
Waiting for file "dance1/flow_77:128/reliable_10_9.pgm"
Elapsed time for stylizing frame:0.024933
Writing output image to dance1/out-00010.png
Waiting for file "dance1/flow_77:128/reliable_11_10.pgm"
Elapsed time for stylizing frame:0.0214
Writing output image to dance1/out-00011.png
Waiting for file "dance1/flow_77:128/reliable_12_11.pgm"
Elapsed time for stylizing frame:0.023577
Writing output image to dance1/out-00012.png
Waiting for file "dance1/flow_77:128/reliable_13_12.pgm"
Elapsed time for stylizing frame:0.025696
Writing output image to dance1/out-00013.png
...

error in BilinearSampler.updateOutput: no kernel image is available for execution on the device

Excuse me:
Thanks for your good work. And now I can stylize a video by using ./stylizeVideo_deepflow sucessfully. But when train a new model by using train_video.lua, there comes an error as described like this, "error in BilinearSampler.updateOutput: no kernel image is available for execution on the device".
Here is my training order, "th train_video.lua -h5_file data/ms-coco-256.h5 -h5_file_video video_dataset/video-364.h5 -image_model models/checkpoint-candy-image.t7 -style_image styles/candy.jpg".
I'd like to known where is the problem?

stylizeVideo_flownet.sh cannot find caffemodel.h5

I got Caffe for flownet2 running on Ubuntu 16.04.
Running
bash stylizeVideo_flownet.sh zoo1.mp4 ./models/checkpoint-mosaic-video.t7
stops due to unfound caffemodel in the flownet2 folder:

Starting optical flow computation...
Traceback (most recent call last):
  File "/home/rstudio/flownet2/scripts/run-flownet-many.py", line 21, in <module>
    if(not os.path.exists(args.caffemodel)): raise BaseException('caffemodel does not exist: '+args.caffemodel)
BaseException: caffemodel does not exist: /home/rstudio/flownet2/FlowNet2/FlowNet2_weights.caffemodel.h5/

However - and this makes it really hard to understand - the file does exist at that path:

rstudio@demo3:~/fast-artistic-videos$ cd /home/rstudio/flownet2/FlowNet2
rstudio@demo3:~/flownet2/FlowNet2$ ls -al
total 638688
drwxr-xr-x  2 rstudio rstudio      4096 Apr 25  2017 .
drwxrwxr-x 33 rstudio rstudio      4096 Apr  5 11:23 ..
-rw-r--r--  1 rstudio rstudio     62798 Apr 25  2017 FlowNet2_deploy.prototxt.template
-rw-r--r--  1 rstudio rstudio     69448 Apr 25  2017 FlowNet2_train.prototxt.template
-rw-r--r--  1 rstudio rstudio 653868648 Apr 25  2017 FlowNet2_weights.caffemodel.h5

Why is that???
So I guess the question is how does args.caffemodel get set by the fast-artistic-videos script that calls it.

Youtube examples

Thanks for this great research and implementation!

I am curious, whether there are any examples out there for video style transfers involving people speaking to a camera, that you would know of. Also, in a nutshell, how long it would take to process a video on a given example hardware. Looking into using this implementation for videos of people filmed speaking to a camera, which is a bit different than most of the examples I could eye out there.

Thanks in advance for your comments!

Consistency Checker Motion Boundaries

Hi, I wanted to know more about the decision to have motion boundaries be described as "certain" (MOTION_BOUNDARIE_VALUE = 255) in the consistency checker, as in the associated paper as well as your previous work from 2016, you describe masking out motion boundaries as uncertain.

I ran consistencyChecker with MOTION_BOUNDARIE_VALUE = 0 and got very different results on a sample operation; naturally, the cert file was darker, having more regions masked out. I haven't tested it on a video yet, as clearly the current version works as-is. I'm more interested to know the reasons behind the discrepancy, or if you do something like set MOTION_BOUNDARIE_VALUE to 0 during train time and 255 for test time—or if I'm perhaps misunderstanding how the certs are calculated.

Thanks!

Hard coded -arch flag

In the stnbdhw module, the arch is hard coded. It might make sense to either add this pointer to the tutorial or to put something like CUDA_ARCH there.

Not an issue - question on viability of using this repo

Are you aware if this codebase been superseded by a faster algorithm?

Error running make_flow_list.py

I feel like I'm overlooking something, but I'm completely stumped as to what.
make_flow_list.py <folder_to_extracted_dataset> <output_folder> [<num_tuples_per_scene> [<num_frames_per_tuble>]

python3 video_dataset/make_flow_list.py video_dataset/extracted/Hollywood 2 video_dataset/extracted/out [num_tuples_per_scene=5 [num_frames_per_tuple=5]

returns the error :
Traceback (most recent call last): File "video_dataset/make_flow_list.py", line 21, in <module> n_tuples = int(sys.argv[3]) ValueError: invalid literal for int() with base 10: 'video_dataset/extracted/out'

I've tried taking away the brackets, changing the variable for num_tuples_per_scene, changing up my file path, and a bunch of other things that didn't do the trick.

additionally, I've been unable to run bash models/download_models.sh, and stnbdhw is misspelled in "(optional) GPU Acceleration" for "cd", but is correct for the luarocks command.
And, just to ask, would I need to bother with hollywood 2 if I used deepflow?

How to calcaulate the confidence map of optical flow

Hi, can you tell me your ideal about how to calculate the confidence map of optical flow,?

Model compatibility

Is it possible to use models trained from Justin Johnson's fast-neural-style code?

Skip pre-computed *flo ?

Hi, Thank you for this amazing code.

I ran it without any problems, It's just I'm wondering if it's possible to skip re-computing .flo files if they already present. I think deepflow can skip them, but when running on flownet2 I have to re-compute the whole flo thing.

Use multiple GPUs

Can this project be extended to use multiple GPUs at once?

Flownet Docker steps

Hello,

I'm aware you explicitly state not to ask about setting up FlowNet, but I saw a step in your setup that I'm hoping you can clarify.

"There is also a Dockerfile for easy installation of the complete code in one step: flownet2-docker (GitHub)"

Golden, got it cooching with Nvidia-Docker 2, I can run this script long hand for two files at a time. I call it with "./run-network.sh -n FlowNet2 -v data/0000000-imgL.png data/0000001-imgL.png flow.flo"

"Then edit run-flownet-multiple.sh and set the paths to the FlowNet executable, model definition and pretrained weights."

Opening that file, it appears that it is looking for a link to the stuff that is inside the Docker container. I've tried a local installation about 15 times, so I'm thinking the Docker container isn't just the easiest, but maybe the ONLY way to get it working.

So, how did you find the paths inside the Docker container for the executable, models defs, and weights?

Does anyone already computed flow for training ?

I'm wondering if anyone with good computer specs, finished the optical flow process, and won't mind to share it with me. I see that it's impossible for me to compute the flow of Hollywood2 dataset on my gtx1060. I can survive the training thing, but to compute the flow on my computer is a piece of craziness. So, if anyone that could share his .flo output files. It will be a the greatest Christmas gift I can ever get.
my email is [email protected]

Is `make_occlusions.sh` meant to be this slow?

Hi all,

I have been running the make_occlusion.sh for over 40 hours on my i7 + RTX 2080 Ti personal Linux machine. The terminal is not providing a consistent info, in which the folder number after "sceneclipautoautotrain" or "scenecliptest" is not rolling in an alphabetic order. It is always a whole chunk of directories as the following example:
/home/username/Github/flownet2-docker-master/video_dataset/makeFlowList/sceneclipautoautotrain00122/flow/reliable_0512_0511.pgm/home/username/Github/flownet2-docker-master/video_dataset/makeFlowList/sceneclipautoautotrain00122/flow/reliable_0511_0512.pgm

Is this normal? Is this occlusion script meant to run for such a long time? Or is there something wrong with it?

Thanks!!

stylizeVideo_flownet.sh: line 85: th: command not found

Hi,

I am using FlowNet2 and I have completed all setups. I am running my first test for the Example Script. I am running into this problem:
Starting occlusion estimator as a background task... Starting video stylization... stylizeVideo_flownet.sh: line 85: th: command not found

I looked into the file stylizeVideo_flownet.sh and I find the line 85:
th fast_artistic_video.lua \ -input_pattern ${filename}/frame_%05d.ppm \ -flow_pattern ${filename}/flow_${resolution}/backward_[%d]_{%d}.flo \ -occlusions_pattern ${filename}/flow_${resolution}/reliable_[%d]_{%d}.pgm \ -output_prefix ${filename}/out \ -backend $backend \ -use_cudnn $use_cudnn \ -gpu $gpu \ -model_vid $model_vid_path \ -model_img $model_img_path

I tried to run this line by itself in terminal. It returns: invalid type for option -gpu (should be number)

I looked into the .sh file to check the -gpu definition. I believe it is assigned here: [0] $cr > " gpu gpu=${gpu:-0}

Now I am a bit confused. How should I fix this? Am I on a wrong track? A lot of thanks.

High Resolution Stylized Videos

Hello again!

So I've produced many stylized videos with your model but I would love to take it to the next level and have super high quality resolution, say around 4000 pixels in width. I've been getting strange errors when I attempt to do this:

Have you attempted to produce a video of this high resolution? If so, how did you attempt to do so?
and any ideas what this error might mean? It does not seem to be a memory issue..

Machine Specs: Linux (Ubuntu 16.04), Quadro P6000, GPU 24 GB, RAM 30 GB, CPUS 8

Problem installing stnbdhw

I changed the FIND_PACKAGE to CUDA 9.1, and bumped the "-arch=sm_20" to sm_30. I get errors saying removing non-existant packages for the THC modules, but they are in home/jim/torch/install/include/THC/generic/. It looks like things are compiling and installing, but the test.lua and init.lua for stn aren't to be found. I think the error message when starting the video stylization is "No LuaRocks module found for stn.test"

Everything else seems to be installed and working. Any help would be appreciated.

Style Model Training Independent of Optical Flow?

What is the fastest way to train a new style? Is it necessary to train an optical flow model on the hollywood dataset, for every new style?

th fast_artistic_video.lua crashes on reading backward_*.flo file

On Ubuntu 16.04 LTS, trying to stylize a video called "01rudy.mp4" by:

th fast_artistic_video.lua -gpu 0 -model_vid ./models/checkpoint-mosaic-video.t7 -model_img ./models/checkpoint-mosaic-image.t7 -input_pattern 01rudy/frame_%05d.ppm -flow_pattern 01rudy/flow_default/backward_[%d]_{%d}-%d.flo -occlusions_pattern 01rudy/flow_default/reliable_[%d]_{%d}.pgm -output_prefix 01rudy

The output video directory contains the frame*.ppm files, the flow_default directory with the forward_.flo, backward_.flo, reliable_.flo files.
However, the script crashes when reading the first backward_.flo file.
Why?

Here's the stack output:


Model loaded.
Model loaded.
Elapsed time for stylizing frame independently:0.369105
Writing output image to 01rudy/out-00001.png
/home/rstudio/torch/install/bin/luajit: cannot open <01rudy/flow_default/backward_2_1-1496952976.flo> in mode r  at /tmp/luarocks_torch-scm-1-4820/torch7/lib/TH/THDiskFile.c:673
stack traceback:
	[C]: at 0x7fdd59363440
	[C]: in function 'DiskFile'
	./flowFileLoader.lua:15: in function 'load'
	fast_artistic_video.lua:155: in function 'func_make_last_frame_warped'
	./fast_artistic_video_core.lua:162: in function 'run_next_image'
	./fast_artistic_video_core.lua:208: in function 'run_fast_neural_video'
	fast_artistic_video.lua:187: in function 'main'
	fast_artistic_video.lua:192: in main chunk
	[C]: in function 'dofile'
	...udio/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

stylizeVideo_deepflow.sh computes all .flo and .pgm but does not output images

Hi, thanks for the amazing code first off all!
However I'm running in some problems with the usage. When I start stylizeVideo_deepflow.sh like this (everything set on default):

bash ./stylizeVideo_deepflow.sh fra_converted.m4v models/checkpoint-schlief-video.t7 models/checkpoint-schlief-image.t7

It outputs the following:

Starting optical flow computation as a background task... Starting video stylization... ./fra_converted/flow_default/reliable_2_1.pgm./fra_converted/flow_default/reliable_1_2.pgmModel loaded. Model loaded. Elapsed time for stylizing frame independently:0.825891 Writing output image to fra_converted/out-00001.png /home/yannick/torch/install/bin/luajit: ./fast_artistic_video/utils.lua:143: attempt to call field 'BilinearSamplerBDHW' (a nil value) stack traceback: ./fast_artistic_video/utils.lua:143: in function 'warp_image' fast_artistic_video.lua:156: in function 'func_make_last_frame_warped' ./fast_artistic_video_core.lua:162: in function 'run_next_image' ./fast_artistic_video_core.lua:208: in function 'run_fast_neural_video' fast_artistic_video.lua:187: in function 'main' fast_artistic_video.lua:192: in main chunk [C]: in function 'dofile' ...nick/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x55eda0826570

Then it starts to compute the OpticalFlow files in "/flow_default" but only outputs one image. After completing to compute all OpticalFlow files, it does nothing.

I am running Ubuntu 18.04, CUDA10.1, cudnn and the modified torch version by nagadomi.
Does anyone have an idea why this happens? Thank you a lot in advance!

Sorry if I forgot any details.

manuelruder / fast-artistic-videos Goto Github PK

fast-artistic-videos's Introduction

fast-artistic-videos

Table of contents

Setup

(Optional) GPU Acceleration

(Optional) cuDNN

Workarounds for installing with recent Ubuntu / CUDA / cuDNN version

Optical flow estimator

DeepFlow setup instructions

FlowNet 2.0 setup instructions

Pretrained Models

Running on new videos

Example script

Advanced usage

Running on new spherical videos

Reprojection software

Example script

Advanced usage

Training new models

Prerequisites

Training parameters

Training parameters for the results in our paper

Contact

License

fast-artistic-videos's People

Contributors

Stargazers

Watchers

Forkers

fast-artistic-videos's Issues

Recommend Projects

Recommend Topics

Recommend Org