Git Product home page Git Product logo

topic-modeling-tool's People

Watchers

James Cloos avatar

topic-modeling-tool's Issues

Stop word list

We are using this tool and wish to add words to the stop word list. Is there 
anywhere that we can download the stop word list/file so that we can add some 
words and use the updated list as our stop word list?   


Original issue reported on code.google.com by [email protected] on 3 Mar 2014 at 8:51

error running topic model gui


Total time: 0 seconds
java.lang.IndexOutOfBoundsException: Index: 284, Size: 10
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at cc.mallet.topics.gui.HtmlBuilder.buildHtml2(HtmlBuilder.java:194)
at cc.mallet.topics.gui.HtmlBuilder.createHtmlFiles(HtmlBuilder.java:293)
at 
cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicM
odelingTool.java:629)
at 
cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModeli
ngTool.java:581)
at 
cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTo
ol.java:446)
Mallet Output files written in C:\results ---> C:\results\output_state.gz , 
C:\results\output_topic_keys


Csv Output files written in C:\results\output_csv
Html Output files written in C:\results\output_html

Original issue reported on code.google.com by [email protected] on 29 Nov 2011 at 5:11

Exception error when running topic model on folder containing subfolders

What steps will reproduce the problem?
1. Running "Learn Topics" on a file folder that contains multiple folders, each 
with its own multiple folders (basically trying to look at data two folders 
beneath the overall folder)
2.
3.

What is the expected output? What do you see instead?
When I run it on a single folder within the main folder, I get actual results. 
When I try to run it on the main folder that contains the subfolders and their 
corresponding subfolders, I get the exception errors that others have gotten 
(I'm not posting them here because they read basically the same as the others 
who have posted).

An explanation - I am using the Enron email database that was generated and 
made publicly available after the Enron scandal. Within the overall "maildir" 
folder are folders for 150 users, each of those users having multiple folders 
within their emails (inbox, sent, etc.). Running the program on the folder of a 
single user (e.g., lay-k for Ken Lay's username) produces results. Running it 
using "maildir" as the input file produces the error. I would like to generate 
a list of topics based on the overall database without having to flatten the 
existing folder structure.

What version of the product are you using? On what operating system?
I can't tell what version it is - I just downloaded it from this site a couple 
of days ago. I am running Windows 7.

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 6 May 2013 at 4:39

Documents requiring mallet's token-regex option cannot be read due to lack of token-regex parameter in TMT

What steps will reproduce the problem?
1. Input documents in a non-English script, e.g. Greek.
2. Run TMT

What is the expected output? What do you see instead?

Mallet doesn't understand where a token starts or stops, so output it just 
gibberish. I expect the words to be recognised as they are.

What version of the product are you using? On what operating system?

TMT 1.0 on Mac OS 10.9

Please provide any additional information below.

This is easily fixed by adding a token-regex input field in "Advanced options" 
which is handed down to mallet.

Original issue reported on code.google.com by [email protected] on 10 Dec 2013 at 11:10

IndexOutOfBoundsExecption, no DocX.html files

What steps will reproduce the problem?
1. Press 'Learn Topics' with file testdata_news_fuel_845docs.txt
2. standard options
3.

What is the expected output? What do you see instead?
The OutputHTML\Docs\DocX.html are not created. And in Topicindocs.csv file 
value for filename=null-source.

Error msg:
java.lang.IndexOutOfBoundsException: Index: 369, Size: 10
    at java.util.ArrayList.rangeCheck(Unknown Source)
    at java.util.ArrayList.get(Unknown Source)
    at cc.mallet.topics.gui.HtmlBuilder.buildHtml2(HtmlBuilder.java:194)
    at cc.mallet.topics.gui.HtmlBuilder.createHtmlFiles(HtmlBuilder.java:293)
    at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicModelingTool.java:629)
    at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModelingTool.java:581)
    at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTool.java:446)
Mallet Output files written in C:\TopicModelingTool ---> 
C:\TopicModelingTool\output_state.gz , C:\TopicModelingTool\output_topic_keys

What version of the product are you using? On what operating system?
Topic modeling tool: Releasedate 3 oct. Windows 7 Home Premium SP1

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 22 Nov 2011 at 12:18

Interfacing with topic-modeling to through command-line

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?
I cannot see any written output apart from the one on the GUI

What version of the product are you using? On what operating system?
The current version. I just downloaded it now

Please provide any additional information below.
I want to use the GUI from command-line and able to save the outputs with the 
same file names as the original file names in the folder.

Original issue reported on code.google.com by [email protected] on 16 Mar 2015 at 9:47

Training Error

What steps will reproduce the problem?
1. input the database expected to detect the latent feature
2. input english directory word
3.

What is the expected output? What do you see instead?

in the attached file

What version of the product are you using? On what operating system?
current version

Please provide any additional information below.



Original issue reported on code.google.com by [email protected] on 7 Apr 2013 at 4:53

Only English worlds in the topics

What steps will reproduce the problem?
1. Russian language texts with several english words in UTF-8 *.txt format as 
input.
2. There are only English words in the topics, without any russian one

What version of the product are you using? On what operating system?
I used the last version from the site on 32-bit Windows XP

I've attached the text files in the archieve


Original issue reported on code.google.com by [email protected] on 25 Nov 2011 at 4:59

Attachments:

exception


<200> LL/token: -8,42566

Total time: 0 seconds
java.lang.IndexOutOfBoundsException: Index: 302, Size: 10
    at java.util.ArrayList.RangeCheck(Unknown Source)
    at java.util.ArrayList.get(Unknown Source)
    at cc.mallet.topics.gui.HtmlBuilder.buildHtml2(HtmlBuilder.java:194)
    at cc.mallet.topics.gui.HtmlBuilder.createHtmlFiles(HtmlBuilder.java:293)
    at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicModelingTool.java:629)
    at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModelingTool.java:581)
    at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTool.java:446)
Mallet Output files written in C:\Users\fr\Desktop ---> 
C:\Users\fr\Desktop\output_state.gz , C:\Users\fr\Desktop\output_topic_keys

Csv Output files written in C:\Users\fr\Desktop\output_csv
Html Output files written in C:\Users\fr\Desktop\output_html

Original issue reported on code.google.com by [email protected] on 25 Nov 2011 at 4:07

Error with character encoding for UTF-8 files

What steps will reproduce the problem?
1. Run TMT with texts in UTF-8 which have words that have characters with 
accents, like "é" or "à". For example texts in French.

What is the expected output? What do you see instead?
- I would expect the topic words to include words that have an accented letter. 
Instead, the topic words will not include these, but include words cut off at 
those characters with accents instead, so "privé" becomes "priv" or "était" 
becomes "tait" or "prêt" becomes "pr" (without the final "t").  

What version of the product are you using? On what operating system?
- I'm using the latest version of TMT on Ubuntu 13.10. 
- Note that the procedure works just fine when I use Mallet directly. 

Original issue reported on code.google.com by [email protected] on 9 Dec 2013 at 4:53

test data not available

What steps will reproduce the problem?
1. open page in Chrome.
2. Go to line with "here" that is supposed to link to test data download
3. click "here" - nothing happens

What is the expected output? What do you see instead?
I expect to see a hot link on the word "here"; I see only text.


What version of the product are you using? On what operating system?
tried both Chrome and Firefox, Windows 7


Please provide any additional information below.
I used "view page source" to see if the html was bad, but no link appeared.


Original issue reported on code.google.com by [email protected] on 15 Mar 2014 at 5:32

How to fix issue with Memroy size!

What steps will reproduce the problem?
1. Using Yelp dataset with 1 + Million documents 
2. Heap memory size is

What is the expected output? What do you see instead?
It should write the result to the output file

What version of the product are you using? On what operating system?
Recent one

Please provide any additional information below.

Is there any way to fix this issue with memory size / limit? while testing with 
large data. 

Original issue reported on code.google.com by [email protected] on 29 Mar 2015 at 10:49

Attachments:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.