Git Product home page Git Product logo

topiccloud's Introduction

#Topic Cloud Generator

This is a simple method of generating topic clouds from Mallet word-topic-counts data based on the implementation at https://de.dariah.eu/tatom/topic_model_visualization.html#visualizing-topic-word-associations. It uses d3.js and Jason Davies's Word Cloud Layout to generate the clouds.

##Requirements:

  • You must have generated a word-topic-counts file from Mallet using the --word-topic-counts-file option.
  • To convert the Mallet data to JSON format: Python with numpy and os modules.
  • To display the topic clouds: An active internet connection (the webpage downloads JQuery and JQuery UI).

##Instructions:

  1. Open topicClouds.py in an editor and configure it at the top. Supply the path to the folder containing your word-topic-counts file, the name of the file itself, and the number of top words you wish to display. Save the file.

  2. In Python, run topicClouds.py. It should output the proportions for each of the top words in the topic. It will also write the data to a JavasScript file called dataset.js in the same folder as your input file.

If you encounter an error in the zip function in line 50, you are probably running Python 2.7. Change it to "izip", and the re-run the script.

  1. Go to your input folder and find the dataset.js file. Copy it into the topicClouds folder (replacing the sample dataset file provided).

  2. Open the topicClouds.html in a browser and it should work. Note that the page requires JQuery, so you must have an active internet connection.

It is possible to modify the size of the word cloud and the appearance of the characters. Open the topicClouds.html file in an editor and modify the configuration in lines 24-29 if necessary. Here are the defaults:

  var width = 300;
  var height = 300;
  var magnify = 1000;
  var wordRotation = "on"; // Allow words to rotate
  var fill = d3.scale.category20(); // 20 colours

Changing fill to d3.scale.category20() will render the cloud in colour.

Save the topicClouds.html file and refresh the browser.

  1. You can change any of the above settings from within the web page by clicking on the Settings button (with the gear icon). The colour scale menu offers a number of standard options in d3. Click the "?" button to see more information on the d3 website. Saving the settings will automatically regenerate the topic cloud with the new settings. However, if you wish to save these settings permanently, you must hard code them in the configuration section as described in step 4 above.

  2. Layouts are generated on the fly and will therefore be different if the page is reloaded (even if the settings are the same). If you are happy with a particular layout, it may be worthwhile to take a screen shot to preserve the exact appearance. In some cases, you may wish to see a new layout. Clicking the Refresh button will automatically generate a new topic cloud without reloading the page.

##Bonus: Some users will not have generated word-topic-counts files when they ran Mallet, or they may have used the GUI Topic Modeling Tool, which does not output the file. However, the data can be extracted from the output_state file that is generated in both types of Mallet runs. The convert_output_state.py script will produce a word-topic-counts file from the data in the output_state file. The script requires input and output file path configurations at the beginning. Caveat emptor: This script is still under development and may be buggy.

topiccloud's People

Contributors

scottkleinman avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.