Git Product home page Git Product logo

arthurdjn / img2poem-pytorch Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 2.0 13 MB

PyTorch implementation of the paper ‟Beyond Narrative Description: Generating Poetry from Images” by B. Liu et al., 2018.

License: MIT License

Python 100.00%
pytorch image-to-poem image-to-text bert-embeddings resnet machine-learning img2poem researchmm zhaoyang liu paired-poems nlp img2poem-pytorch image-captioning

img2poem-pytorch's Introduction

Python pytorch arxiv website

img2poem-pytorch 🖼️ 📃

PyTorch implementation of the paper ‟Beyond Narrative Description: Generating Poetry from Images” by B. Liu et al., 2018.

Currently in progress ! 💻

Feel free to star the project or open an issue !

Table of Contents

1. Overview

This project introduces poem generation from images. This implementation was inspired from the research paper ‟Beyond Narrative Description: Generating Poetry from Images” by Liu, Bei et al., published in 2018 at Microsoft.

The implementation is already coded with TensorFlow in the official Microsoft repository. This repository tries to rearrange implementation from “Neural Poetry Generation with Visual Inspiration.” by Li, Zhaoyang et al. and create a model architecture similar to Bei, Liu et al., with PyTorch.

1.1. Get Started

To use this project, clone the repository from the command line with:

$ git clone https://github.com/arthurdjn/img2poem-pytorch

Then, navigate to the project root:

$ cd img2poem-pytorch

2. Datasets

To train the models, you will need to download the datasets used in this project.

The datasets used are:

  • PoemUniMDatasetMasked: a dataset of poems only,
  • PoemMuliMDatasetMasked: a dataset of paired poems and images,
  • PoeticEmbeddedDataset: a dataset to align poems and images.
  • ImageSentimentDataset: a dataset of images and polarities,

2.1. Downloads

To download the dataset, use the download() method, defined for all datasets. It will downloads poems and images in a root folder.

For example, you can use:

from img2poem.datasets import ImageSentimentDataset

dataset = ImageSentimentDataset.download(root='.data')

3. Architecture

The architecture is decomposed in two parts:

  • Encoder, used to extract poeticness from an image,
  • Decoder, used to generate a poem from a poetic space.

The encoder is made of three CNN, used to extract scene, object, and sentiment information. To align these features in a poetic space, this encoder is used with a BERT model, to align visual feature with their paired poems.

Then, the decoder works with a discriminator which evaluates the poeticness of a generated poem.

3.1. Image

The visual encoder is made of three CNN.

3.1.1. Object

The object detection classifier is the vanilla ResNet50, from TorchVision. More info here.

3.1.2. Scenes

The scene classifier is a ResNet50 model fine tuned on the Places365 dataset. You can find the weights on the MIT platform here.

3.1.3. Sentiment

To train the visual sentiment classifier, use the ImageSentimentDataset with the ResNet50Sentiment model.

You can use the script scripts/train_resnet50.py to fine tune the model:

$ python scripts/train_resnet50.py
0. Hyper params...
------------------------
Batch size:       64
Learning Rate:    5e-05
Split ratio:      0.9
------------------------

1. Loading the dataset...
Loading: 100%|█████████████████████████████████| 15613/15613 [01:16<00:00, 203.41it/s]

2. Building the model...
done

3. Training...
Epoch 1/100
  Training: 100%|██████████| 199/199 [01:18<00:00,  2.55it/s, train loss=0.030669]
Evaluation: 100%|██████████| 199/199 [00:24<00:00,  8.26it/s, eval loss=0.030008]
	Training:   loss=0.025023
	Evaluation: loss=0.024733
Eval loss decreased (inf --> 0.024733).
→ Saving model...

Epoch 2/100
  Training: 100%|██████████| 199/199 [01:17<00:00,  2.57it/s, train loss=0.030093]
Evaluation: 100%|██████████| 199/199 [00:24<00:00,  8.27it/s, eval loss=0.027973]
	Training:   loss=0.024398
	Evaluation: loss=0.024037
Eval loss decreased (0.024733 --> 0.024037).
→ Saving model...

Epoch 3/100
  Training: 100%|██████████| 199/199 [01:17<00:00,  2.57it/s, train loss=0.029633]
Evaluation: 100%|██████████| 199/199 [00:24<00:00,  8.28it/s, eval loss=0.029494]
	Training:   loss=0.023714
	Evaluation: loss=0.023400
Eval loss decreased (0.024037 --> 0.023400).
→ Saving model...

...

3.2. Poetic Alignment

To align visual features to a poetic space, the paired poem & image dataset is used (a.k.a multim_poem.json).

Images and poems are both embedded:

  • the poems are embedded through a BERT model into a feature vector of shape ,
  • and the images are embedded with the concatenation of the visual models (objects, Scenes and Sentiment) into a feature vector of shape .

To measure the loss from the feature tensors coming from poems and images, I used the ranking loss, described in the original paper by Bei Liu et al. and Zhaoyang Li et al. implementation.

3.3. Generator

The generator is a recurrent based decoder. I used GRU cells, as explained in the original paper, to generate a sentence from a feature tensor from the poetic space.

The discriminator is a module which classify a sequence as real, unpaired or generated (cf. the original paper)

4. Notebooks

W.I.P

5. References

  • [1] Liu, Bei et al. “Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training”, 2018. ACM Multimedia Conference - ACM MM2018.
    Paper | GitHub

  • [2] Li, Zhaoyang et al. “Neural Poetry Generation with Visual Inspiration.”, 2018.
    Paper | GitHub

img2poem-pytorch's People

Contributors

arthurdjn avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

img2poem-pytorch's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.