Comments (31)
@danielpclark I'll take a look as soon as I can, but it probably won't be until early Summer. Thanks for giving me the heads up!
from classifier-reborn.
@Ch4s3 Hey, I love that you brought Rust in on this. I'd like to provide some resources that may be helpful in getting this done. I'm the author of faster_path which rewrites Ruby's Pathname library in Rust for improving the performance of Rails at 30%+
I see you're using Thermite. That's next on my agenda for faster_path as it will allow binary builds of the Rust build of the dynamic library to be served from a host and thereby not require the Ruby users to have Rust installed. My current focus on my project is to have my test suite prove cross-platform Rust compilation works since Mac OS & a few Linux distros skip the Rust library build process.
But as for the helpful resources, the code base for faster_path has plenty of working Rust code running under Ruby. The next two things are an article I wrote Coming to Rust from Ruby and my documentation during my first week learning Rust Getting started in Rust.
Also when looking up Rust methods the internet is actually a more difficult way to find answers; the best answers to be found are directly from documentation provided in the Dash app. Searching for methods in Dash has been the quickest and most accurate tool for the job. An alternative to Dash is Zeal which is the open source version of it.
from classifier-reborn.
@parkr The more I read about this, the more I think a pure Ruby implementation is just going to always give us poor performance. It might make sense to bundle a C solution into the gem since there a lot of great existing solutions. Then just check to make sure it builds correctly on various platforms.
I'm not married to this approach, but it could be the right way forward. Thoughts?
from classifier-reborn.
That seems like a good plan.
from classifier-reborn.
Ok. I'll devise a plan tonight and scout out implementations. It'll be good to wrap this one up.
from classifier-reborn.
Sounds terrific. Thanks, Chase!
from classifier-reborn.
Not a problem. I want to wrap up a few things in a row and get a release out in the next week(ish).
from classifier-reborn.
I figured out how to build and compile C-extensions, and just need to settle on an implementation that has a good license. @parkr do you have a strong OSS license preference?
*edit this implementation looks nice, and I've seen the author's name on some papers. The license is GNU.
from classifier-reborn.
As long as the license is compatible and GitHub can use it for commercial purposes, it's fine. I usually lean for the MIT. Can MIT projects contain GNU code? I'm not sure.
from classifier-reborn.
I'll read up
from classifier-reborn.
Ok, I've tried few things and have it sort of working, but it segfaults on rare occasions. Still a wip.
from classifier-reborn.
Ok, I've tried few things and have it sort of working, but it segfaults on rare occasions. Still a wip.
Sweet! When is it segfaulting? Is the C still optional?
from classifier-reborn.
It segfaults while transforming the matrix for some inputs. It could be optional.
from classifier-reborn.
Any news here? :)
from classifier-reborn.
@jayniz I had to take a break from this for a bit, but I'll try to get back to it soon. Basically I found 1 or 2 promising implementations, but they both segfault for some input. I'm not sure why yet so I haven't gotten past that. I'll do my best to figure it out soon, but I'm open to people helping out.
from classifier-reborn.
Hey @Ch4s3 no problemo :-) The bayes classifier is working and goes into production today.
I played around with LSI locally, and I got the above errors for my inputs - not with the GLS lib and gem though, that worked. But where training the bayes with 6k inputs took ~8 sec, LSI took a couple of minutes for 600 inputs (and with 6k inputs it's still running after 30h) on a recent 13"mbp with 16G ram.
In my benchmarks the bayes classifier performed quite well to detect comment spam though - trained with 6k comments I had it classify ~80k comments and it classified correctly in 94.5% of the cases, with 4.75% false negatives and 0.75% false positives.
Let's see how it behaves in production!
from classifier-reborn.
@jayniz Awesome to hear that the bayes classifier is working! I'll keep working on the SVD.
from classifier-reborn.
@jayniz If you can provide me with sample data that worked with the GLS but broke with the ruby implementation, that would be super helpful!
from classifier-reborn.
@Ch4s3 not straight away, I'd have to run it on some data to see if it crashes first :-) if you let me know which implementation I should try to crash, I can try to find some time this week to make it crash and then send you the data that crashed it?
from classifier-reborn.
@jayniz Awesome, see if you can blow up the native ruby implementation. I found an LGPL implementation of the svd in C and I'm writing a wrapper for a c extension, but it won't be done straight away. Any input that crashes other implementations will be good for testing.
from classifier-reborn.
from classifier-reborn.
ahh sorry, the LSI feature uses the SVD under the hood, and the ruby svd is what makes it slow vs using GSL. See this line.
from classifier-reborn.
from classifier-reborn.
No worries. I probably won't have the c extension done by the weekend anyway.
from classifier-reborn.
So after a few tries, I can't get the c extension right. If anyone else wants to try, I can push up what I tried. Maybe it would make sense to use Helix to wrap a rust SVD lib. That way we could distribute the binary and have some minimal guarantees about safety.
from classifier-reborn.
May have found a candidate rust lib here and some workable demo code here... It might actually work
This is the direction I'm thinking of going
from classifier-reborn.
@danielpclark Have you moved to thermite yet? I'm looking to pick this back up.
from classifier-reborn.
@Ch4s3 No… The author updated his PR to be current with the code base but the CI tests results are intermittent/flaky.
from classifier-reborn.
Yeah, I saw that. I need to get back to the Rust book and try this again.
from classifier-reborn.
Hello,
can you check my test case
I have strings = [] with 100 element
lsi = ClassifierReborn::LSI.new
strings.each do |x|
lsi.add_item x[0], x[1]
end
but after specific number of added Items I am getting comparison of Float with NaN failed
Once I remove the last element from classifier I do the process again and give same error on another string.
I am not sure Why this error raised for string rather than other.
Some strings I got this error Math::DomainError: Numerical argument is out of domain - "sqrt"
Any solution ??
from classifier-reborn.
@Ch4s3 I've got Thermite integrated now. I've submitted a bunch of PRs to ruru that give you most of Ruby's native features (they've been sitting unnoticed for a while) and I've merged them all into my own fork https://github.com/danielpclark/ruru/tree/playground if you want to try them out. I'm building FasterPath directly from it so that repo branch is here to stay.
I have pretty much mastered Rust to Ruby integration. Ruru doesn't support splat operators for parameters yet so I wrote code that's a little more bare metal here if you'd like to see it working with any number of parameters of input. Other than that the rest of how to do it should be somewhat easy to see from FasterPath.
from classifier-reborn.
Related Issues (20)
- Migrating classifier data from an older classifier-reborn structure HOT 14
- whan i add a utf8 chars HOT 1
- In some languages like Chinese, a word of length not bigger than 2 is very common, so I suppose this is a very strong(sometimes wrong in other languages) assumption. HOT 2
- How to install via jruby HOT 1
- ability to serialize model? HOT 1
- "ArgumentError: comparison of Float with NaN failed" if trying to search a corpus with an item that lacks common words HOT 3
- HTTPS for static site HOT 4
- Deprecated Gem::Specification#has_rdoc HOT 4
- 2.3.0 not released to Rubygems HOT 4
- broken links to docs (domain name not resolving) HOT 6
- TypeError: no implicit conversion from nil to integer in /classifier-reborn-2.2.0/lib/classifier-reborn/lsi.rb:313:in `sort' HOT 2
- Multiple separate bayes classifiers with single redis database HOT 1
- Documentation at classifier-reborn.com in inaccessible HOT 6
- Allow redis connection to be injected HOT 1
- Can classifier-reborn work with Numo::NArray / Numo::GSL ? Is that a better choice than nmatrix? HOT 9
- Is this project still actively maintained, or is it abandoned? HOT 3
- Problem with certain characters?
- [JRuby] Tests fail with jar-dependencies version mismatch
- Add prefix to the Redis keys
- Jekyll LSI not calculated on localized blog posts HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from classifier-reborn.