jamesknox / topic-modeling-tool Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/topic-modeling-tool
Automatically exported from code.google.com/p/topic-modeling-tool
We are using this tool and wish to add words to the stop word list. Is there
anywhere that we can download the stop word list/file so that we can add some
words and use the updated list as our stop word list?
Original issue reported on code.google.com by [email protected]
on 3 Mar 2014 at 8:51
Total time: 0 seconds
java.lang.IndexOutOfBoundsException: Index: 284, Size: 10
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at cc.mallet.topics.gui.HtmlBuilder.buildHtml2(HtmlBuilder.java:194)
at cc.mallet.topics.gui.HtmlBuilder.createHtmlFiles(HtmlBuilder.java:293)
at
cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicM
odelingTool.java:629)
at
cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModeli
ngTool.java:581)
at
cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTo
ol.java:446)
Mallet Output files written in C:\results ---> C:\results\output_state.gz ,
C:\results\output_topic_keys
Csv Output files written in C:\results\output_csv
Html Output files written in C:\results\output_html
Original issue reported on code.google.com by [email protected]
on 29 Nov 2011 at 5:11
What steps will reproduce the problem?
1. Running "Learn Topics" on a file folder that contains multiple folders, each
with its own multiple folders (basically trying to look at data two folders
beneath the overall folder)
2.
3.
What is the expected output? What do you see instead?
When I run it on a single folder within the main folder, I get actual results.
When I try to run it on the main folder that contains the subfolders and their
corresponding subfolders, I get the exception errors that others have gotten
(I'm not posting them here because they read basically the same as the others
who have posted).
An explanation - I am using the Enron email database that was generated and
made publicly available after the Enron scandal. Within the overall "maildir"
folder are folders for 150 users, each of those users having multiple folders
within their emails (inbox, sent, etc.). Running the program on the folder of a
single user (e.g., lay-k for Ken Lay's username) produces results. Running it
using "maildir" as the input file produces the error. I would like to generate
a list of topics based on the overall database without having to flatten the
existing folder structure.
What version of the product are you using? On what operating system?
I can't tell what version it is - I just downloaded it from this site a couple
of days ago. I am running Windows 7.
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 6 May 2013 at 4:39
What steps will reproduce the problem?
1. Input documents in a non-English script, e.g. Greek.
2. Run TMT
What is the expected output? What do you see instead?
Mallet doesn't understand where a token starts or stops, so output it just
gibberish. I expect the words to be recognised as they are.
What version of the product are you using? On what operating system?
TMT 1.0 on Mac OS 10.9
Please provide any additional information below.
This is easily fixed by adding a token-regex input field in "Advanced options"
which is handed down to mallet.
Original issue reported on code.google.com by [email protected]
on 10 Dec 2013 at 11:10
What steps will reproduce the problem?
1. Press 'Learn Topics' with file testdata_news_fuel_845docs.txt
2. standard options
3.
What is the expected output? What do you see instead?
The OutputHTML\Docs\DocX.html are not created. And in Topicindocs.csv file
value for filename=null-source.
Error msg:
java.lang.IndexOutOfBoundsException: Index: 369, Size: 10
at java.util.ArrayList.rangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at cc.mallet.topics.gui.HtmlBuilder.buildHtml2(HtmlBuilder.java:194)
at cc.mallet.topics.gui.HtmlBuilder.createHtmlFiles(HtmlBuilder.java:293)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicModelingTool.java:629)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModelingTool.java:581)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTool.java:446)
Mallet Output files written in C:\TopicModelingTool --->
C:\TopicModelingTool\output_state.gz , C:\TopicModelingTool\output_topic_keys
What version of the product are you using? On what operating system?
Topic modeling tool: Releasedate 3 oct. Windows 7 Home Premium SP1
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 22 Nov 2011 at 12:18
Hello,
How can I train and obtain results as the GUI using command line and saving the
results with the original file name. Thanks
Original issue reported on code.google.com by [email protected]
on 16 Mar 2015 at 9:39
What steps will reproduce the problem?
1.
2.
3.
What is the expected output? What do you see instead?
I cannot see any written output apart from the one on the GUI
What version of the product are you using? On what operating system?
The current version. I just downloaded it now
Please provide any additional information below.
I want to use the GUI from command-line and able to save the outputs with the
same file names as the original file names in the folder.
Original issue reported on code.google.com by [email protected]
on 16 Mar 2015 at 9:47
What steps will reproduce the problem?
1. input the database expected to detect the latent feature
2. input english directory word
3.
What is the expected output? What do you see instead?
in the attached file
What version of the product are you using? On what operating system?
current version
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 7 Apr 2013 at 4:53
What steps will reproduce the problem?
1. Russian language texts with several english words in UTF-8 *.txt format as
input.
2. There are only English words in the topics, without any russian one
What version of the product are you using? On what operating system?
I used the last version from the site on 32-bit Windows XP
I've attached the text files in the archieve
Original issue reported on code.google.com by [email protected]
on 25 Nov 2011 at 4:59
Attachments:
<200> LL/token: -8,42566
Total time: 0 seconds
java.lang.IndexOutOfBoundsException: Index: 302, Size: 10
at java.util.ArrayList.RangeCheck(Unknown Source)
at java.util.ArrayList.get(Unknown Source)
at cc.mallet.topics.gui.HtmlBuilder.buildHtml2(HtmlBuilder.java:194)
at cc.mallet.topics.gui.HtmlBuilder.createHtmlFiles(HtmlBuilder.java:293)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicModelingTool.java:629)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModelingTool.java:581)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTool.java:446)
Mallet Output files written in C:\Users\fr\Desktop --->
C:\Users\fr\Desktop\output_state.gz , C:\Users\fr\Desktop\output_topic_keys
Csv Output files written in C:\Users\fr\Desktop\output_csv
Html Output files written in C:\Users\fr\Desktop\output_html
Original issue reported on code.google.com by [email protected]
on 25 Nov 2011 at 4:07
What steps will reproduce the problem?
1. Run TMT with texts in UTF-8 which have words that have characters with
accents, like "é" or "à". For example texts in French.
What is the expected output? What do you see instead?
- I would expect the topic words to include words that have an accented letter.
Instead, the topic words will not include these, but include words cut off at
those characters with accents instead, so "privé" becomes "priv" or "était"
becomes "tait" or "prêt" becomes "pr" (without the final "t").
What version of the product are you using? On what operating system?
- I'm using the latest version of TMT on Ubuntu 13.10.
- Note that the procedure works just fine when I use Mallet directly.
Original issue reported on code.google.com by [email protected]
on 9 Dec 2013 at 4:53
What steps will reproduce the problem?
1. open page in Chrome.
2. Go to line with "here" that is supposed to link to test data download
3. click "here" - nothing happens
What is the expected output? What do you see instead?
I expect to see a hot link on the word "here"; I see only text.
What version of the product are you using? On what operating system?
tried both Chrome and Firefox, Windows 7
Please provide any additional information below.
I used "view page source" to see if the html was bad, but no link appeared.
Original issue reported on code.google.com by [email protected]
on 15 Mar 2014 at 5:32
What steps will reproduce the problem?
1. Using Yelp dataset with 1 + Million documents
2. Heap memory size is
What is the expected output? What do you see instead?
It should write the result to the output file
What version of the product are you using? On what operating system?
Recent one
Please provide any additional information below.
Is there any way to fix this issue with memory size / limit? while testing with
large data.
Original issue reported on code.google.com by [email protected]
on 29 Mar 2015 at 10:49
Attachments:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.