Comments (7)
Hi Jake,
Great question! Let me think for a second, as I haven't looked at the bayesian code for a while.
Jake
—
Sent from Mailbox
On Tue, Nov 4, 2014 at 2:13 PM, JakeAustwick [email protected]
wrote:
Apologies in advance as my knowledge of Go is still somewhat limited, so this may be a naive question.
I want to expose the naive bayes classifying as a HTTP Web service, with both train and classify endpoints. I have no trouble with that, but I want the train endpoint to be able to accept new labels (labels that aren't currently in the classifier). Right now the labels are simply specified as consts and passed into the constructor. Can you think of the best way to add the ability to add labels at run-time?
Reply to this email directly or view it on GitHub:
#5
from bayesian.
I believe all you would have to do is add a method to the Classifier struct that would look something like this (not tested):
func (c *Classifier) AddClass(class Class) {
c.Classes = append(c.Classes, class)
c.datas[class] = newClassData()
}
This will add the new class and subsequent calculations will take it into account. Note this is not particularly thread-safe if you're adding new classes asynchronously because the number of classes is a parameter to the calculations.
If you submit a pull request + unit test, I would be happy to merge the change. There's also some cleanup that can be done on the NewClassifier method once this method is added.
Thanks,
Jake
from bayesian.
The above code works, I tested it on our codebase - thanks for that. Are you still open to a patch even though you state you know it wouldn't be thread safe?
In our application I simply wrapped all of our interactions with the classifier with a mutex, but I'm guessing that would be undesirable to do inside the library itself?
from bayesian.
IMO it would be best to leave thread-safety to the user, as it adds performance overhead, etc.
Just out of curiosity, what are you using it for? And how big is your data set? :)
from bayesian.
We're using it to identify categories for uncategorized products based on their keywords / title.
As of right now, our dataset is < 10k training data items, and approx. different 250 labels.
I'll look at contributing the above patch today/tomorrow along with tests, as I haven't looked at testing in Go yet.
from bayesian.
Cool!
I'm curious about the performance on that much data, as the library does
everything in memory. It would be interesting to think about how to support
larger sets. Also, does this approach to categorization work well?
As for testing, check out this page for basic Golang testing:
https://golang.org/doc/code.html#Testing. I've also recently been using
Ginkgo (http://onsi.github.io/ginkgo/) on other projects, which is similar
to Rspec and works great -- highly recommend.
Thanks for your interest!
Jake
On Wed, Nov 5, 2014 at 1:10 PM, JakeAustwick [email protected]
wrote:
We're using it to identify categories for uncategorized products based on
their keywords / title.As of right now, our dataset is < 10k training data items, and approx.
different 250 labels.I'll look at contributing the above patch today/tomorrow along with tests,
as I haven't looked at testing in Go yet.—
Reply to this email directly or view it on GitHub
#5 (comment).
from bayesian.
That's not a terrible amount of data to load into memory in my opinion. I've loaded 2.7 million items into a struct for an in memory geocoding package and it only used about 500MB of RAM. I've also done something similar for gender detection with a much smaller list. The performance is fast in memory and unless you're storing millions of items, the memory allocation should be quite small.
That said, 250 different classifications is quite a lot and I too would be curious about a benchmark in this case.
from bayesian.
Related Issues (11)
- Panic if underflow is detected in `SafeProbScores` HOT 3
- Return bayesian.Class Instead of Index? HOT 5
- Prior probability includes word frequencies?
- Seen() is always 0? HOT 1
- Allow classifier to initialise with only one class HOT 1
- request for a tag of an older commit
- Request for a new function that will enable adding of new class to an existing classifier
- Release 1.0 is really old - make a new release HOT 5
- Wrong package name in docs
- what is good or bad?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bayesian.