Git Product home page Git Product logo

Comments (31)

Ch4s3 avatar Ch4s3 commented on May 31, 2024 2

@danielpclark I'll take a look as soon as I can, but it probably won't be until early Summer. Thanks for giving me the heads up!

from classifier-reborn.

danielpclark avatar danielpclark commented on May 31, 2024 1

@Ch4s3 Hey, I love that you brought Rust in on this. I'd like to provide some resources that may be helpful in getting this done. I'm the author of faster_path which rewrites Ruby's Pathname library in Rust for improving the performance of Rails at 30%+

I see you're using Thermite. That's next on my agenda for faster_path as it will allow binary builds of the Rust build of the dynamic library to be served from a host and thereby not require the Ruby users to have Rust installed. My current focus on my project is to have my test suite prove cross-platform Rust compilation works since Mac OS & a few Linux distros skip the Rust library build process.

But as for the helpful resources, the code base for faster_path has plenty of working Rust code running under Ruby. The next two things are an article I wrote Coming to Rust from Ruby and my documentation during my first week learning Rust Getting started in Rust.

Also when looking up Rust methods the internet is actually a more difficult way to find answers; the best answers to be found are directly from documentation provided in the Dash app. Searching for methods in Dash has been the quickest and most accurate tool for the job. An alternative to Dash is Zeal which is the open source version of it.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

@parkr The more I read about this, the more I think a pure Ruby implementation is just going to always give us poor performance. It might make sense to bundle a C solution into the gem since there a lot of great existing solutions. Then just check to make sure it builds correctly on various platforms.

I'm not married to this approach, but it could be the right way forward. Thoughts?

from classifier-reborn.

parkr avatar parkr commented on May 31, 2024

That seems like a good plan.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Ok. I'll devise a plan tonight and scout out implementations. It'll be good to wrap this one up.

from classifier-reborn.

parkr avatar parkr commented on May 31, 2024

Sounds terrific. Thanks, Chase!

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Not a problem. I want to wrap up a few things in a row and get a release out in the next week(ish).

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

I figured out how to build and compile C-extensions, and just need to settle on an implementation that has a good license. @parkr do you have a strong OSS license preference?

*edit this implementation looks nice, and I've seen the author's name on some papers. The license is GNU.

from classifier-reborn.

parkr avatar parkr commented on May 31, 2024

As long as the license is compatible and GitHub can use it for commercial purposes, it's fine. I usually lean for the MIT. Can MIT projects contain GNU code? I'm not sure.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

I'll read up

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Ok, I've tried few things and have it sort of working, but it segfaults on rare occasions. Still a wip.

from classifier-reborn.

parkr avatar parkr commented on May 31, 2024

Ok, I've tried few things and have it sort of working, but it segfaults on rare occasions. Still a wip.

Sweet! When is it segfaulting? Is the C still optional?

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

It segfaults while transforming the matrix for some inputs. It could be optional.

from classifier-reborn.

jayniz avatar jayniz commented on May 31, 2024

Any news here? :)

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

@jayniz I had to take a break from this for a bit, but I'll try to get back to it soon. Basically I found 1 or 2 promising implementations, but they both segfault for some input. I'm not sure why yet so I haven't gotten past that. I'll do my best to figure it out soon, but I'm open to people helping out.

from classifier-reborn.

jayniz avatar jayniz commented on May 31, 2024

Hey @Ch4s3 no problemo :-) The bayes classifier is working and goes into production today.

I played around with LSI locally, and I got the above errors for my inputs - not with the GLS lib and gem though, that worked. But where training the bayes with 6k inputs took ~8 sec, LSI took a couple of minutes for 600 inputs (and with 6k inputs it's still running after 30h) on a recent 13"mbp with 16G ram.

In my benchmarks the bayes classifier performed quite well to detect comment spam though - trained with 6k comments I had it classify ~80k comments and it classified correctly in 94.5% of the cases, with 4.75% false negatives and 0.75% false positives.

Let's see how it behaves in production!

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

@jayniz Awesome to hear that the bayes classifier is working! I'll keep working on the SVD.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

@jayniz If you can provide me with sample data that worked with the GLS but broke with the ruby implementation, that would be super helpful!

from classifier-reborn.

jayniz avatar jayniz commented on May 31, 2024

@Ch4s3 not straight away, I'd have to run it on some data to see if it crashes first :-) if you let me know which implementation I should try to crash, I can try to find some time this week to make it crash and then send you the data that crashed it?

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

@jayniz Awesome, see if you can blow up the native ruby implementation. I found an LGPL implementation of the svd in C and I'm writing a wrapper for a c extension, but it won't be done straight away. Any input that crashes other implementations will be good for testing.

from classifier-reborn.

jayniz avatar jayniz commented on May 31, 2024

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

ahh sorry, the LSI feature uses the SVD under the hood, and the ruby svd is what makes it slow vs using GSL. See this line.

from classifier-reborn.

jayniz avatar jayniz commented on May 31, 2024

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

No worries. I probably won't have the c extension done by the weekend anyway.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

So after a few tries, I can't get the c extension right. If anyone else wants to try, I can push up what I tried. Maybe it would make sense to use Helix to wrap a rust SVD lib. That way we could distribute the binary and have some minimal guarantees about safety.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

May have found a candidate rust lib here and some workable demo code here... It might actually work

This is the direction I'm thinking of going

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

@danielpclark Have you moved to thermite yet? I'm looking to pick this back up.

from classifier-reborn.

danielpclark avatar danielpclark commented on May 31, 2024

@Ch4s3 No… The author updated his PR to be current with the code base but the CI tests results are intermittent/flaky.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Yeah, I saw that. I need to get back to the Rust book and try this again.

from classifier-reborn.

eftikharEmad avatar eftikharEmad commented on May 31, 2024

Hello,
can you check my test case
I have strings = [] with 100 element

lsi = ClassifierReborn::LSI.new
strings.each do |x| 
  lsi.add_item x[0], x[1]
end

but after specific number of added Items I am getting comparison of Float with NaN failed
Once I remove the last element from classifier I do the process again and give same error on another string.

I am not sure Why this error raised for string rather than other.
Some strings I got this error Math::DomainError: Numerical argument is out of domain - "sqrt"

Any solution ??

from classifier-reborn.

danielpclark avatar danielpclark commented on May 31, 2024

@Ch4s3 I've got Thermite integrated now. I've submitted a bunch of PRs to ruru that give you most of Ruby's native features (they've been sitting unnoticed for a while) and I've merged them all into my own fork https://github.com/danielpclark/ruru/tree/playground if you want to try them out. I'm building FasterPath directly from it so that repo branch is here to stay.

I have pretty much mastered Rust to Ruby integration. Ruru doesn't support splat operators for parameters yet so I wrote code that's a little more bare metal here if you'd like to see it working with any number of parameters of input. Other than that the rest of how to do it should be somewhat easy to see from FasterPath.

from classifier-reborn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.