Git Product home page Git Product logo

assignment-3-rnns-for-text-generation's Introduction

Language modelling and text generation using RNNs

Assignment 3 for language analytics (S2023). This repository contains code for training a text generation model on comments from articles from The New York Times using recurrent neural networks. Furthermore, a script is provided for generating text from a user-suggested prompt.

Description of the data

The model is trained on comments from articles from The New York Times. The data is available here.

Note: The model in this repo was trained on 50,000 randomly chosen comments to avoid memory issues.

Usage and reproducibility

To reproduce the results follow the directions below. All terminal commands should be executed from the root of the repository.

  1. Clone the repository
  2. Download the data from Kaggle and place it in the data directory (see repository structure section below if in doubt)
  3. Create a virtual environment and install the requirements
bash setup.sh
  1. Train the model by running the following commands from the command-line
source ./env/bin/activate
python src/train_model.py --n_comments 50000
  1. Generate text
python src/generate_text.py --prompt "Donald Trump wins"

The pipeline was developed and tested on uCloud(Ubuntu v22.10m, Coder python v1.77.3, python v3.10.7).

It is possible to specify the model you want to use for text generation, if you have several in the repository.

Repository structure

├── data                        <- data folder (not included in the github repo)
├── mdl
│   ├── model_seq_291.h5
│   └── tokenizer_seq_291.pickle 
├── src
│   ├── generate_text.py
│   └── train_model.py
├── assignment_description.md
├── README.md
├── requirements.txt
└── setup.sh

Results

To evaluate the text-generating capabilities of the model trained on the comments, several prompts were provided to the model. The generated outputs from the model are presented in the table below:

Prompt Generated Text
Donald Trump wins again is are said it
Barack Obama wins right again is are true
Flooding in Alabama president it up it else
Great article, I hope it it it it
The future of renewable energy said on it it it

Based on these test examples, it is evident that the model struggles to produce coherent sentences that are grammatically and semantically meaningful. Increasing the number of comments the model is trained on might increase its usability, or training for more epochs.

assignment-3-rnns-for-text-generation's People

Contributors

laurabpaulsen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.