Comments (14)
@chris357 really small inputs without meaningful words i.e. only stop words are known to break LSI. I'm looking into how to handle this more gracefully.
from classifier-reborn.
Just started playing with this today, and I won't pretend to understand Vector::ZeroVectorError: Zero vectors can not be normalized
in the context of this gem. But I was able to replicate the issue by editing the Readme example.
No issue:
require 'classifier-reborn'
lsi = ClassifierReborn::LSI.new
strings = [ ["This text deals with dogs. Dogs.", :dog],
["This text involves dogs too. Dogs! ", :dog],
["This text revolves around cats. Cats.", :cat],
["This text also involves cats. Cats!", :cat],
["This text involves birds. Birds.",:bird ]]
strings.each {|x| lsi.add_item x.first, x.last}
p lsi.classify "This text is also about dogs!"
Note I'm going to change the first string.
No issue:
["This text deals with dogs.", :dog]
Still no issue:
["This text deals.", :dog]
Issue:
["This te.", :dog]
=>Vector::ZeroVectorError: Zero vectors can not be normalized
Not sure if thats helpful.
(Running this as a rake task in a Rails app)
from classifier-reborn.
Not sure! Do you have any posts which aren't like the others? Perhaps one without any content?
from classifier-reborn.
This is one of those tricky Matrix building errors, I think. Any empty post might cause this.
from classifier-reborn.
+1 same error here
from classifier-reborn.
fixed by #77
from classifier-reborn.
Time for some reanimation of zombie threads... I ran into this issue again.
Classifier Reborn 2.1.0, GSL 2.1.0.3. I'm taking data from a Rails app and trying to feed it into an LSI classifier:
Post.where.not(body: nil).each do |p|
body = p.body.tr "\n", ''
if p.is_tp
lsi.add_item p.body, :spam
elsif p.is_fp
lsi.add_item p.body, :ham
end
end
The data being fed in looks something like this:
<p>I am looking for either a web app or installable program that will track themovies I have watched.</p><p>I have found a few online but I would like one that also tracks how many times Ihave watched each movie, as I like to rewatch many of my movies.</p><p>I would also like to be able to easily sort the data, for example sorting by"last watched" or sorting by number of views.</p><p>Other requirements</p><ul><li>if installable program, must work with Windows</li><li>no answers saying "use a spreadsheet"</li><li>no answers saying "make your own"</li><li>no Windows Media Player (does not support MKV)</li><li>no Banshee (Windows version is out of date)</li></ul>
Getting the same old ZeroVectorError
: Zero vectors can not be normalized
. Stripping the HTML out doesn't help.
from classifier-reborn.
@ArtOfCode- can you try our master branch? There's been a lot of work dones since 2.1.0 was released.
from classifier-reborn.
Sure thing. I'm not at a dev machine right now, but I'll give it a shot when I get back later on.
from classifier-reborn.
I just cut a new version, 2.2.0. Let me know if it works.
from classifier-reborn.
So I've given both a shot. Neither 2.2.0 from Gems or the master branch solve the problem - still getting the same error.
from classifier-reborn.
More info: unknowingly, I actually wasn't using GSL (was installed but not loaded... bah). Using GSL, the problem seems to have disappeared. That seems to indicate the issue is somewhere in CR's own implementation of vectors.
from classifier-reborn.
That is in fact consistent with what I'd expect. We intend to fix our implementation, but it's a tricky algorithm to implement correctly in Ruby.
from classifier-reborn.
This may be fixed (ish) by #173
from classifier-reborn.
Related Issues (20)
- Migrating classifier data from an older classifier-reborn structure HOT 14
- whan i add a utf8 chars HOT 1
- In some languages like Chinese, a word of length not bigger than 2 is very common, so I suppose this is a very strong(sometimes wrong in other languages) assumption. HOT 2
- How to install via jruby HOT 1
- ability to serialize model? HOT 1
- "ArgumentError: comparison of Float with NaN failed" if trying to search a corpus with an item that lacks common words HOT 3
- HTTPS for static site HOT 4
- Deprecated Gem::Specification#has_rdoc HOT 4
- 2.3.0 not released to Rubygems HOT 4
- broken links to docs (domain name not resolving) HOT 6
- TypeError: no implicit conversion from nil to integer in /classifier-reborn-2.2.0/lib/classifier-reborn/lsi.rb:313:in `sort' HOT 2
- Multiple separate bayes classifiers with single redis database HOT 1
- Documentation at classifier-reborn.com in inaccessible HOT 6
- Allow redis connection to be injected HOT 1
- Can classifier-reborn work with Numo::NArray / Numo::GSL ? Is that a better choice than nmatrix? HOT 9
- Is this project still actively maintained, or is it abandoned? HOT 3
- Problem with certain characters?
- [JRuby] Tests fail with jar-dependencies version mismatch
- Add prefix to the Redis keys
- Jekyll LSI not calculated on localized blog posts HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from classifier-reborn.