So, this is an example <div class="highlight highlight-source-ruby notranslate pos

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Weird behavior when one category is empty about classifier-reborn HOT 17 CLOSED

jekyll commented on May 30, 2024

Weird behavior when one category is empty

from classifier-reborn.

Comments (17)

parkr commented on May 30, 2024

Very interesting! The fact that the classifier doesn't have a competing classification may be the root cause. When not_a_number is empty, the classifier only has one classification that it could possibly pick – null is not an option, and even if the match is some infinitesimal number or 0 itself, it's the only classification we know about so match it. Solution would be to say match coefficient must be > 0 – even if infinitesimal – and if the match coefficient is 0, then return null. It's a breaking change of the API, though, so maybe we could just return "".

[13] pry(main)> number_finder.classify("5")
=> "A number"
[14] pry(main)> number_finder.classify("15 and more numbers")
=> "A number"
[15] pry(main)> number_finder.classify("numbers")
=> "A number"
[16] pry(main)> number_finder.classify("lol wut ?")
=> "A number"
[17] pry(main)> number_finder.classify("Is this a Bug ? ")
=> "A number"
[18] pry(main)> number_finder.classify("")
=> "A number"

from classifier-reborn.

bararchy commented on May 30, 2024

@parkr Will it be too bad to say something like: "if value is not classified in category 'a', then, even though category 'b' is empty, it belongs there" ?

from classifier-reborn.

bararchy commented on May 30, 2024

Maybe if I give an example of my usage it will be easier to understand my need to leave one category empty.

I'm using this gem to learn HTTP Traffic, I'm setting it to "Training Mode" and show it what "normal traffic" looks like, then, I want it to check traffic and if the classification isn't "normal traffic" then it is "suspicious traffic" and I drop the packet.

Right now it only works if I show it what "suspicious traffic" looks like, but this creates kind of a 'black list' situation, and I want more of a 'white list' approach

from classifier-reborn.

Ch4s3 commented on May 30, 2024

I'm just getting back from vacation I'll take a look soon

from classifier-reborn.

bararchy commented on May 30, 2024

@Ch4s3 Hi, did you manage to see what's going on here ?

from classifier-reborn.

Ch4s3 commented on May 30, 2024

Sorry, I got bogged down catching up at work. It's on my radar though.

from classifier-reborn.

MadBomber commented on May 30, 2024

This is akin to a "none of the above" kind of classification where given a set of categories if the best fit is less than some threshold then a result indicating "none of the above" or :unknown is returned.

from classifier-reborn.

bararchy commented on May 30, 2024

@MadBomber Good point, this is exactly the answer I was looking for from the classifier :)
I hope it could be implemented.

from classifier-reborn.

MadBomber commented on May 30, 2024

Take a look at my fork to see if this is what you had in mind.

MadBomber@5096334 MadBomber@5096334

I am not 100% sure this is a good solution to what you want to do. I'm thinking that there will be a large number of false positives. You may find yourself spending more time adjusting the threshold value.

I will submit a pull request after you play with it for a while.

Dewayne
o-*

On Oct 19, 2015, at 10:17 AM, Bar Hofesh [email protected] wrote:

@MadBomber https://github.com/MadBomber Good point, this is exactly the answer I was looking for from the classifier :)
I hope it could be implemented.

—
Reply to this email directly or view it on GitHub #47 (comment).

from classifier-reborn.

MadBomber commented on May 30, 2024

In your pry session I think that if you had used #classify_with_score you would have seen that the score was being returned as Float::INIFINITY for text that was not classified as 'a_number'

from classifier-reborn.

bararchy commented on May 30, 2024

@MadBomber I just tried your version, again, only training the "normal activity" category.
This is what I do:

ai_overlord = ClassifierReborn::Bayes.new 'normal_activity', 'suspicious_activity', {:enable_threshold => true}
=> #<ClassifierReborn::Bayes:0x0000000210ec38
 @auto_categorize=false,
 @categories={:"Normal activity"=>{}, :"Suspicious activity"=>{}},
 @category_counts={},
 @category_word_count={},
 @enable_threshold=true,
 @language="en",
 @threshold=0.0,
 @total_words=0>

### Training the classifier 
ai_overlord.train_normal_activity("Firefox chrome mozzila GET POST / http 1.1 1.0 1.2 Accept" * 1000)

[35] pry(main)> ai_overlord.classify("GET / POST")
=> nil
[36] pry(main)> ai_overlord.classify_with_score("GET / POST")
=> ["Suspicious activity", Infinity]

## Trying to play around with threshold

23] pry(main)> ai_overlord.threshold = 0.5
=> 0.5
[24] pry(main)> ai_overlord.classify("GET / ")
=> nil
[25] pry(main)> ai_overlord.threshold = 10.0
=> 10.0
[26] pry(main)> ai_overlord.classify("GET / ")
=> nil
[27] pry(main)> ai_overlord.classify("GET / POST")
=> nil
[28] pry(main)> ai_overlord.threshold = 50.0
=> 50.0
[29] pry(main)> ai_overlord.classify("GET / POST")
=> nil

### Making sure the Threshold is changed inside the class
[37] pry(main)> ai_overlord
=> #<ClassifierReborn::Bayes:0x0000000210ec38
 @auto_categorize=false,
 @categories={:"Normal activity"=>{:firefox=>1, :chrome=>1000, :mozzila=>1000, :get=>1000, :post=>1000, :http=>1000, :acceptfirefox=>999, :accept=>1, :/=>1000, :"."=>3000}, :"Suspicious activity"=>{}},
 @category_counts={:"Normal activity"=>1},
 @category_word_count={:"Normal activity"=>10001},
 @enable_threshold=true,
 @language="en",
 @threshold=50.0,
 @total_words=10001>

I seems that again when one category is empty it would always classify to the empty one, would the threshold feature help in this case ?

Thanks :)

from classifier-reborn.

MadBomber commented on May 30, 2024

Given your examples, that is proper behavior. You trained the classifier with only one example - a very long string. You asked it to classify a very short string. It rejected the string showing a score of Infinity which means that there is no matching category.

Try this pattern:

Notice there is only one category: Normal

ai_overlord = ClassifierReborn::Bayes.new(
'Normal',
enable_threshold: true
)

normal_request = "Firefox chrome mozzila GET POST / http 1.1 1.0 1.2 Accept"

10.times { |x| ai_overlord.train_normal(normal_request) }

Dynamically set the threshold to less than a known sample

ai_overlord.threshold = ai_overlord.classify_with_score(normal_request).last - 0.5

ai_overlord.classify(normal_request)

Now try to classify a counter-example

abnormal_request = "Safari opera webkit GET POST / http 1.1 1.0 1.2 Accept"
ai_overlord.classify( abnormal_request )

o-*

On Oct 20, 2015, at 2:26 AM, Bar Hofesh [email protected] wrote:

@MadBomber https://github.com/MadBomber I just tried your version, again, only training the "normal activity" category.
This is what I do:

ai_overlord = ClassifierReborn::Bayes.new 'normal_activity', 'suspicious_activity', {:enable_threshold => true}
=> #<ClassifierReborn::Bayes:0x0000000210ec38
@auto_categorize=false,
@categories={:"Normal activity"=>{}, :"Suspicious activity"=>{}},
@category_counts={},
@category_word_count={},
@enable_threshold=true,
@language="en",
@Threshold=0.0,
@total_words=0>

Training the classifier

ai_overlord.train_normal_activity("Firefox chrome mozzila GET POST / http 1.1 1.0 1.2 Accept" * 1000)

[35] pry(main)> ai_overlord.classify("GET / POST")
=> nil
[36] pry(main)> ai_overlord.classify_with_score("GET / POST")
=> ["Suspicious activity", Infinity]

Trying to play around with threshold

23] pry(main)> ai_overlord.threshold = 0.5
=> 0.5
[24] pry(main)> ai_overlord.classify("GET / ")
=> nil
[25] pry(main)> ai_overlord.threshold = 10.0
=> 10.0
[26] pry(main)> ai_overlord.classify("GET / ")
=> nil
[27] pry(main)> ai_overlord.classify("GET / POST")
=> nil
[28] pry(main)> ai_overlord.threshold = 50.0
=> 50.0
[29] pry(main)> ai_overlord.classify("GET / POST")
=> nil

Making sure the Threshold is changed inside the class

[37] pry(main)> ai_overlord
=> #<ClassifierReborn::Bayes:0x0000000210ec38
@auto_categorize=false,
@categories={:"Normal activity"=>{:firefox=>1, :chrome=>1000, :mozzila=>1000, :get=>1000, :post=>1000, :http=>1000, :acceptfirefox=>999, :accept=>1, :/=>1000, :"."=>3000}, :"Suspicious activity"=>{}},
@category_counts={:"Normal activity"=>1},
@category_word_count={:"Normal activity"=>10001},
@enable_threshold=true,
@language="en",
@Threshold=50.0,
@total_words=10001>
—
Reply to this email directly or view it on GitHub #47 (comment).

from classifier-reborn.

bararchy commented on May 30, 2024

So, @MadBomber just wanted to ask, is there a way to add a "none of the above" option ? this way I can have a default "none of the above" value, and then I can only train one classification.

from classifier-reborn.

MadBomber commented on May 30, 2024

On Oct 27, 2015, at 3:42 AM, Bar Hofesh [email protected] wrote:

So, @MadBomber https://github.com/MadBomber just wanted to ask, is there a way to add a "none of the above" option ? this way I can have a default "none of the above" value, and then I can only train one classification.

The feature has been merged with the master but a new version of the gem has not yet been published. You can look at the code to see the details:

https://github.com/jekyll/classifier-reborn/blob/master/lib/classifier-reborn/bayes.rb https://github.com/jekyll/classifier-reborn/blob/master/lib/classifier-reborn/bayes.rb

Here is the gist:

it only works with the classify method. All other methods behave as before. If the result falls below a threshold score or the score is INFINITY the result returned will be nil. So to see if it was "none of the above" just check for result.nil?
you can enable the threshold at initialization time with the option 'enable_threshold' set to true. You can also enable/disable threshold process at any time using the methods enable_threshold and disable_threshold.
The default threshold is 0.0 any score below this will return a nil result; HOWEVER, threshold that you should use is one that makes sense for your application. You can set your own threshold at initialization time with the option 'threshold' which expects a floating point number. You can reset the threshold or get its value using the methods 'threshold=' or just 'threshold'

Check out the unit tests:

https://github.com/jekyll/classifier-reborn/blob/master/test/bayes/bayesian_test.rb https://github.com/jekyll/classifier-reborn/blob/master/test/bayes/bayesian_test.rb

The test at line 82 'test_classification_with_threshold_again' is your specific scenario as I understood it.

Lets us know if you catch any bad guys using this technique.

Dewayne
o-*

from classifier-reborn.

bararchy commented on May 30, 2024

@MadBomber Thanks for the great explanation and the example in the tests.

Right now I used a -200.0 threshold to stop a SQL Injection attack from SQLMAP.

I need to play around with letting the classifier learn more, then, test a few attacks.
anyhow this is fine for my use case, many thanks (also it's stable, I would push a new gem version ;) )
I'll update if necessary, issue closed :)

from classifier-reborn.

Ch4s3 commented on May 30, 2024

@MadBomber and @bararchy I'm going to try to release a new version soon. I basically jut need to get some stuff on the readme about the new features.

from classifier-reborn.

MadBomber commented on May 30, 2024

I will add a section to the README on the threshold features. Should have a pull request in by tonight.

o-*

On Oct 27, 2015, at 10:54 PM, Chase Gilliam <[email protected] mailto:[email protected]> wrote:

@MadBomber https://github.com/MadBomber and @bararchy https://github.com/bararchy I'm going to try to release a new version soon. I basically jut need to get some stuff on the readme about the new features.

—
Reply to this email directly or view it on GitHub #47 (comment).

from classifier-reborn.

Weird behavior when one category is empty about classifier-reborn HOT 17 CLOSED

Comments (17)

Notice there is only one category: Normal

Dynamically set the threshold to less than a known sample

Now try to classify a counter-example

Training the classifier

Trying to play around with threshold

Making sure the Threshold is changed inside the class

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent