Git Product home page Git Product logo

compiledcomputertales's Introduction

CompiledComputerTales

This is a (necessarily incomplete) corpus of stories created by computational storytelling algorithms, compiled by Leonid Berov and Kai Standvoss. To reflect the breadth of the field of computational storytelling, it collects at most three stories from as many systems as available to us. Because it is unfeasible to manually deploy all individual systems and generate stories for this purpose, we instead opted for using stories that have been reported in scientific publications. Be advised that this runs the danger of biasing the corpus towards high-quality exemplars. The sources that have been used to extract the stories for each storytelling system are reported in the file "references.txt".

If you use this corpus in your research, kindly refer to the associated publication:
Berov, L., & Standvoss, K. (2018). Discourse Embellishment Using a Deep Encoder-Decoder Network

Format

The corpus is located in the file "story_corpus.txt".

The individual storytelling systems are separated by a line of the following form: ==== name of system ====.
Individual stories are separated by a line of the following form: ==.
Paragraphs inside the stories are separated by a newline symbol \n, which means that each line contains one paragraph of story text.

Preprocessing

To pre-process the corpus for common recurrent neural network frameworks like tensorflow, a python script is provided in the file "story_corpus_processer.py". At the moment it supports parsing the data from the corpus, word and sentence tokenization, cleaning up special symbols, named-entity anonymization as well as sentence-pair generation (as employed in section 3.4 of the associated publication). For post-processing the output of a neural network that performed inference on this corpus, a naive method for dealing with out-of-vocabulary tokens is provided.

THIS SCRIPT IS PROVIDED β€œAS IS” AND THE DEVELOPERS MAKE NO OTHER WARRANTIES, EXPRESSED OR IMPLIED, AND HEREBY DISCLAIM ALL IMPLIED WARRANTIES.

Content

state   date        content
v1      05.09.18    8 storytellers, 14 stories, 45 paragraphs, 290 sentences

compiledcomputertales's People

Contributors

cartisan avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.