Git Product home page Git Product logo

rohetoric / text-vector-visualisation Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4.53 MB

Website: https://rohetoric.github.io/text-vector-visualisation/

License: Apache License 2.0

Ruby 1.89% Makefile 1.55% Dockerfile 0.84% Shell 3.87% Python 2.69% Smarty 1.70% HTML 25.68% Jupyter Notebook 42.60% JavaScript 10.24% SCSS 8.93%
fasttext fasttext-embeddings tensorflow tensorflow1 tensorflow2 tensorflow-tutorials tensorflow-examples tensorflow-experiments spacy spacy-nlp python3 machine-learning data-science data-visualization

text-vector-visualisation's Introduction

Exploration & Visualisation of FastText Word Vectors Using TensorFlow 1 and 2

Requirements and Dependencies

To run the code the following are a must to be installed:

Serial No Libraries to Install
1. FastText
2. TensorFlow
3. Spacy

Steps to Execute

  1. Download the bbc-text.csv dataset from here​ or it can be downloaded through the terminal if gcloud is already setup by the command gsutil cp gs:​//​dataset-uploader​/bbc/​bbc-text.csv [path to notebook directory]

  2. Make sure all the libraries are present/updated according to the requirements and dependencies mentioned above.

  3. To train the model according to the above complete dataset using FastText, run the notebook fasttextmodeltrain.ipynb present in _notebooks folder. A pre-trained model (2.4GB size) based on the dataset can be downloaded from here.

According to the FastText documentation:

The most important parameters of the model are its dimension and the range of size for the subwords. The dimension (dim) controls the size of the vectors, the larger they are the more information they can capture but requires more data to be learned. As any value in the 100-300 range is popular, the notebook has been implemented with dimension equal to 300.

Steps 4,5 and 6 differ for TF1 and TF 2. After that, the steps are same.


To Visualise Embeddings Using TF1 [NOT ADVISED]

  1. Create a folder called tb1files in the same directory of the notebooks​ and keep it empty. It will store all the tensorflow log files after step 5 is run.

  2. Run the notebook ​tb1vis.ipynb present in _notebooks folder​.

  3. Set the terminal address path to the directory where the files are stored in the terminal and type the command: tensorboard ​ --logdir tb1files/

The above command would yield a result:

TB1 Command


To Visualise Embeddings Using TF2 [ADVISED]

  1. Create a folder called tb2files in the same directory of the notebooks​ and keep it empty. It will store all the tensorflow log files after step 5 is run.

  2. Run the notebook ​tb2vis.ipynb​ present in _notebooks folder​.

  3. Set the terminal address path to the directory where the files are stored in the terminal and type the command: tensorboard ​ --logdir tb2files/

The above command would yield a result:

TB2 Command

  1. Open the local host URL link present in the last line. For Example: http://localhost:6008/​ [in TB1 Command image].

  2. The local host website shown below will run. From the drop-down which reads Inactive, press and go to Projector as depicted by the arrow in the image below.

Projector

  1. This will plot the words according to their embedding values shown in the 3D graph of tensorboard. The nearest neighbours of a word can be found by typing the word in the search bar, as done for the example ‘plea’ shown below.

TB Visualisation

That's it, folks!

text-vector-visualisation's People

Contributors

dependabot[bot] avatar nirantk avatar rohetoric avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.