Git Product home page Git Product logo

style_transfer_sirius2021summer's Introduction

Style transfer in NLP: a framework and multilingualanalysis with Friends TV series

This is the repository of "Style transfer in NLP: a framework and multilingual analysis with Friends TV series" paper.

Style transfer is an important and a rapidly developing of Natural Language Processing. This days more and more methods and models are proposed which allow us to generate text in predefined style. In this paper we propose a framework for style transfer of "Friends" TV series. The trained models are able to mimic one of 6 main characters of this famous TV-series in English and Russian. We also present a dialogue dataset of "Friends" subtitles in English and its Russian automatic translation. In addition to that we perform a multilingual comparison of "Friends" style transfer in the two considered languages.

Content

bot

This folder contains data for Telegram-bot:

  • data - DB for storing state of each chat, rating given to each message; paths to models and log
  • models - Folder template holding the pre-trained models
  • ui - Utilities for enhancing UI
  • utils - Database control, Model uploader and Rating
  • main.py - The main file to start bot itself

data

Folder folder contains all output datasets we have:

  • bigram_pics - pictures of frineds without background
  • data_for_tone_analysis - statistics of tone analysis from positive and negative words
  • generated - phrases generated by GPT3-Large
  • questions - quections in English and Russian for mannual assessment of generated phrases
  • scripts - all scripts with speakers' annotation and phrases of all friends in English and Russian
  • train_data - train data for two step finetuning of GPT3-Large models split in 9 to 1 ratio (monologues and cleaned replics) in English and Russian

utils

The folder folder contains all Jupiter notebooks:

  • bigrams_trigrams - a notebook to create bigrams and trigrams for each friend
  • binary_classifier - notebooks for Bianry Classifiers (Training + Evaluation)
  • multilabel_classifier - a notebook for Multilabel Classifiers
  • Other files:
    • Parser.ipynb - parses website with series' scripts
    • Data_preparation.ipynb - cleans parsed scripts from irrelevant symbols and words
    • Statistics.ipynb - gets statistics of most frequently used words and visualizes it
    • Phrases_Preprocessing.ipynb - gets phrases that are common for friends and hard to detect by a classifier in English
    • Ru_Phrases_Preprocessing.ipynb - gets phrases that are common for friends and hard to detect by a classifier in Russian
    • Text_Analysis.ipynb - brief analysis of most frequently used words
    • Metrics.ipynb - preprocessing and furhter tone anaylis

The checkpoints of the trained models stored here.

style_transfer_sirius2021summer's People

Contributors

mr-s-mirzoev avatar gungnirap avatar alenush avatar elinated avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.