Git Product home page Git Product logo

language-modelling---movie-reviews's Introduction

Language-Modelling---Movie Reviews Generation

Generate movie reviews for torchtext IMDB dataset using LSTM based Language Model

1) Built a Markov (n-gram) language model

Have a markov model trigram model. Torch test was used to retrieve the IMDB dataset and custom function generate trigrams is passed to the preprocessing argument in the data.

Used field function to put each review into trigrams and freqs function is used to get the count of each trigram. This data is further manipulated to calculate probabilties and predictions are made my smapling these probabilities.

1) Built an LSTM based language model

Used the trigrams and there was no difficulty in calculating the probabilities and prediction.

For LSTM there was an input layer followed by embedding then lstm layer. The input was the tokenized text and output also the tokenized text offsetted by 1 place.

Ex) if input is [This,is,my,favorite,movie] then output is [is,my,favorite,movie,<>]. If you give the word "This" then model should predict "is" and so on. Due to the size of the input training model with a decent training size became a struggle.

Issues faced

Took 25% training data with max 1000 words with batch size 10 to train the LSTM model. If we increased the number of words then the model couldn't be loaded into the GPU as there was insufficient memory. If we increased the training data to higher percentage say 50% or 80% then notebook got disconnected from the server randomly. This error didn't go when I tried to run the code as standalone .py file instead of a python notebook. When I reduced the batch size to 1 then model took longer time to train and when we increased it to 64 then there was memory error once again. So, settled at Batch Size 10.

Due to this limited training size and words the quality of sentence formed by predicition was affected considerably. The prediction didn't give out a coherent sentence as we would have liked.

Generated predictions for both Markov (n-gram) language model and LSTM based language model.

You can download the code and run in jupyter notebook.

language-modelling---movie-reviews's People

Contributors

nimishaasati avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.