Git Product home page Git Product logo

shizen's Introduction

Non-Generative Textual Style Transfer

This is the implementation of my dissertation, written in part fulfilment for my bachelor's degree at the University of York in the UK. See report.pdf for the write-up. In short, I come up with a simple non-generative novel mechanism based on doc2vec that performs textual style transfer. The code published here comprises the content illustrated solely under section 2.4 within report.pdf. Other content is not included.

See the table below, extracted from the report, for a demonstration of the capabilities of the developed method:

Dependencies

All project dependencies are listed below:

The code may run correctly if packages with more recent versions are installed, but no testing has been conducted to guarantee such.

Running

Firstly, it is necessary to download the political corpus from RtGender and place it in the corpora/original folder. At the time of writing, this data can be downloaded from the following link. If the data has been appropriately placed in the right folder, the file structure of the corpora folder should be the same as below:

Then, scripts may be run in the following order:
  1. corpora_resplit.py
  2. corpora_sanitise.py
  3. classifier_baseline.py
  4. doc2vec.py
  5. style_transfer.py
  6. style_transfer_appendix.py (optional)

This should correctly reproduce the main style transfer experiment, saving outputs to the out folder. Raise an issue if some problem is encountered while trying to run the scripts.

Bibliography

S. Prabhumoye et al., “Style Transfer Through Back-Translation,” in Association for Computational Linguistics, 2018, pp. 866–876.

T. Shen et al., “Style Transfer from Non-Parallel Text by Cross-Alignment,” in Neural Information Processing Systems, 2017.

Q. Le and T. Mikolov, “Distributed Representations of Sentences and Documents,” in International Conference on Machine Learning, 2014.

R. Voigt et al., “RtGender: A Corpus for Studying Differential Responses to Gender,” in Language Resources and Evaluation Conference, 2018, pp. 2814–2820.

Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Empirical Methods in Natural Language Processing, 2014, pp. 1746–1751.

M. Rikters, “Impact of Corpora Quality on Neural Machine Translation,” in Baltic Human Language Technologies, 2018.

Citation

P. H. M. Wigderowitz, “Towards a More Accessible Style Transfer Mechanism,” BSci Thesis, Dept. of Comp. Science, Univ. of York, York, UK, 2019.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.