ptnplanet / java-naive-bayes-classifier Goto Github PK
View Code? Open in Web Editor NEWA java classifier based on the naive Bayes approach complete with Maven support and a runnable example.
A java classifier based on the naive Bayes approach complete with Maven support and a runnable example.
I tried to serialize a classifier I train on my data set but it turn out that it uses a non serializable object. The "Classifier" class in itself implements serializable but I am not quit sure which class it uses does not.
Thank you and great work with the naive Bayes classifier.
via Classifier code i guess the featureProbability should not bigger than 1.0, but I found that if have 2+ "I" in the trainning material, I got the featureProbability more than 1.0+. following is the code snippet:
in test change the learn material like:
final String[] positiveText = "I happy now, I like you".split("\\s");
print some info in featureProbability :
float featureCount = (float) this.featureCount(feature, category);
float categoryCount = (float) this.categoryCount(category);
System.out.println("raw p(f|c) " + String.valueOf(feature) + featureCount + "->" + String.valueOf(category) + categoryCount + "->" + featureCount / categoryCount); return featureCount / categoryCount;
then the output:
raw p(f|c) I2.0->positive1.0->2.0
does the result is correct? or can you clearfy what the math model to calc p(f|c)?
Hello,
I know you haven't worked on this in a while but was wondering if you had any idea why I keep seeing this issue. I have added about 25 categories to the model with lots of data in each category. For the majority of the categories no matter what I feed in when I classify a chunk of text most of the categories return a probability of infinity.
ex.
Classification[
category=friends_gatherings,
probability=Infinity,
featureset=[
after,
school,
soccerabout,
this,
...
--
]
]
I am using double[] as feature instead of string in runnableExample, but the feature of class -ve is still classifying as +ve. And in featureCount, I am not getting any value other than null.
Hello.
Is there any way to store results of the learning as experience data in my database?
Forgive me if I am wrong, but on this line
would not
return this.getFeatureCount(feature, category) / (float) this.getFeatureCount(feature);
return P(category | feature) rather than P(feature | category)?
For each learn call, the probability that a category happens is set to 1.0. This in turn makes every classify be whatever the last learn's category was.
Please could you add a (MIT?) license?
Thanks
Hi, thanks for simple and well working code.
I have some question featureWeighedAverage method.
What does these parameter do? Is it like Laplace smoothness term?
It would be helpful for me if there is a paper to refer to the weighted average formula used in the featureWeighedAverage method.
It is better to have a save and load for model to be reused later
The documentation is referenced in the readme file but there is no link to it? pls help thanks (also kinda new to java so might be a stupid question - any help is well appreciated)
getprobability() return value more than 1 (not between 1 and 0)
what should i do?
tnx
Is it possible to save/load (maybe some kind of serialization/deserialization) the trained classifier?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.