Git Product home page Git Product logo

eccv2020_cpcsv's Introduction

CPCStoryVisualization-Pytorch (ECCV 2020)

PWC

Author: @yunzhusong, @theblackcat102, @redman0226, Huiao-Han Lu, Hong-Han Shuai

Paper Link (ECCV 2020)

Organization

Code implementation for Character-Preserving Coherent Story Visualization

Objects in pictures should so be arranged as by their very position to tell their own story.
           - Johann Wolfgang von Goethe (1749-1832)

In this paper we propose a new framework named Character-Preserving Coherent Story Visualization (CP-CSV) to tackle the challenges in story visualization: generating a sequence of images that emphasizes preserving the global consistency of characters and scenes across different story pictures.

CP-CSV effectively learns to visualize the story by three critical modules: story and context encoder (story and sentence representation learning), figure-ground segmentation (auxiliary task to provide information for preserving character and story consistency), and figure-ground aware generation (image sequence generation by incorporating figure-ground information). Moreover, we propose a metric named Frechet Story Distance (FSD) to evaluate the performance of story visualization. Extensive experiments demonstrate that CP-CSV maintains the details of character information and achieves high consistency among different frames, while FSD better measures the performance of story visualization. The FVD evaluation metric is from here.

Datasets

  1. PORORO images and segmentation images can be downloaded here. Pororo, original pororo datasets with self labeled segmentation mask of the character.

  2. CLEVR with segmentation mask, 13755 sequence of images, generate using Clevr-for-StoryGAN

images/
    CLEVR_new_013754_1.png
    CLEVR_new_013754_1_mask.png
    CLEVR_new_013754_2.png
    CLEVR_new_013754_2_mask.png
    CLEVR_new_013754_3.png
    CLEVR_new_013754_3_mask.png
    CLEVR_new_013754_4.png
    CLEVR_new_013754_4_mask.png

Download link

Setup environment

    virtualenv -p python3 env
    source env/bin/activate
    pip install -r requirements.txt

Train CPCSV

Steps

  1. Download the Pororo dataset and put at DATA_DIR, downloaded. The dataset should contain SceneDialogues/ ( where gif files reside ) and *.npy files.

  2. Modify the DATA_DIR in ./cfg/final.yml

  3. The dafault hyper-parameters in ./cfg/final.yml are set to reproduce the paper results. To train from scratch:

./script.sh
  1. To run the evaluation, specify the --cfg to ./output/yourmodelname/setting.yml, e.g.,:
./script_inference.sh

Evaluate CPCSV

Pretrained model can be download here.

Steps

  1. Download the Pororo dataset and put at DATA_DIR, downloaded. The dataset should contain SceneDialogues/ ( where gif files reside ) and *.npy files.

  2. Modify the DATA_DIR in ./cfg/final.yml

  3. To evaluate FID, FSD of the pretrained model:

./script_inference.sh

Tensorboard

Use the tensorboard to check the results.

    tensorboard --logdir output/ --host 0.0.0.0 --port 6009

The slide and the presentation video:

The slide and the presentation video can be found in slides.

Cite

@inproceedings{song2020CPCSV, 
    title={Character-Preserving Coherent Story Visualization},  
    author={Song, Yun-Zhu and Tam, Zhi-Rui and Chen, Hung-Jen and Lu, Huiao-Han and Shuai, Hong-Han},  
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},  
    year={2020} 
}

eccv2020_cpcsv's People

Contributors

yunzhusong avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

eccv2020_cpcsv's Issues

About the different usage of CA net for image and video sampling

The output of CANet(content) are content_code, content_mu, content_logvar. However, the video sampling uses content_code () for motion_content_rnn but image sampling uses content_mu.

Why there are different usages?

Video sampling:

crnn_code = self.motion_content_rnn(motion_input, r_code) ## i_t = GRU(s_t)

Image sampling:

crnn_code = self.motion_content_rnn(motion_input, c_mu) ## GRU

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.