Git Product home page Git Product logo

Comments (19)

ibnesayeed avatar ibnesayeed commented on May 31, 2024

I usually test in isolated docker containers and I didn't notice anything unusual, except it is talking longer than memory backend. Additionally, the integration test is taking noticeably longer as it needs to train about 7K instances, classify 1K instances, untrain 2K instances, and classify 1K instances again, both for Memory Redis. If the Redis is not running or cant connect, those tests are skipped. I think the lag is coming from the periodic save operations when Redis tries to log the write operations into dump.rdb file. However, since we don't need persistence on disk, we can disable this behavior. I will give it a try and will see if the speed improves.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Let try disabling that. If the slower tests are mostly due to the larger test case, I'm fine with that. We aren't running tons of builds back to back, so I think better test quality beats speed here.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

Data size does affect the testing time. I have refactored the code to make it configurable as constants rather than hard coding numbers in the test itself.

TRAINING_SIZE = 4000, TESTING_SIZE = 1000 => About 17 sec.
TRAINING_SIZE = 1000, TESTING_SIZE = 100 => About 4 sec.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

I would much prefer to move the backend initialization and configuration to be run once per test suit rather than for each test case. However, MiniTest does not provide this functionality out of the box. It is possible with custom runner though.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

If you're willing to write a custom runner, I'm open to it. The only reason we use minitest is b/c it was originally done with test unit.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

If you're willing to write a custom runner, I'm open to it. The only reason we use minitest is b/c it was originally done with test unit.

I am fairly new to actually writing tests, so I will have to understand the dynamics of what is involved and how would that custom runner work. Test unit had various lifecycle callback methods including before and after each test case and suit, but for some reason Minitest decided not to have them available. I initially tried writing an initialize method in my test class, but it turned out that it was going to be called for each test case, so it was not going to help. However, we can perhaps cache the instance variable that holds the reference to Redis backend. On the other hand, I feel like it will be a micro-optimization, because currently the slowest test is the integration one with lots of data.

That said, #97 can still be merged and we can revisit custom runner option later.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

I'll leave this open for now so we can continue to discuss and explore.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

It looks like writing the custom runner will not benefit us in this case. I did some experiments and here are my findings:

  • Caching instance variables (such as @redis_backend or @classifier) that are defined in the setup method is not going to help because each test case runs as independent instance with its own initialization.
  • Changing those instance variables to class variables (such as @@redis_backend) would allow caching, but does not improve the over all execution time (evident from the empirical results).
  • Connecting to the Redis server and creating the @redis_backend instance variable takes less than 0.3ms on my machine (average ≈0.25ms). We are currently running 23 tests that require @redis_backend. We are also creating an @alternate_redis_backend each time (though it is only needed in some tests). This means we are attempting to connect to Redis 23 * 2 = 46 times. The total time would be around 23 * 2 * 0.25 = 11.5ms.
  • Now, is it really worth the effort? Needless to mention the added complexity it will bring in the test code along with the ugliness of putting data in the class rather than instances.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Excellent profiling. I'd say based on your results, it's safe to close this.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

The integration test is where all the time is consumed, here are some stats based on different values of training and testing data sizes (time values are sum of Memory and Redis backends for the same task):

TRAINING_SIZE = 4000
TESTING_SIZE = 1000

Load data (tarining: 4000) + (testing: 1000) => 2.80472ms
Train 4000 records => 5790.959149ms
Test 1000 records => 2065.361196ms
Untrain 2000 records => 6426.361111ms
Test 1000 records => 2320.733856ms

Time spent in Memory classifier => 410.059577ms
Time spent in Redis classifier => 16948.311882ms
TRAINING_SIZE = 1000
TESTING_SIZE = 100

Load data (tarining: 1000) + (testing: 100) => 0.760892ms
Train 1000 records => 1522.433455ms
Test 100 records => 200.561411ms
Untrain 500 records => 1623.415373ms
Test 100 records => 193.476481ms

Time spent in Memory classifier => 85.74332ms
Time spent in Redis classifier => 3573.399931ms
TRAINING_SIZE = 100
TESTING_SIZE = 10

Load data (tarining: 100) + (testing: 10) => 0.11450099999999999ms
Train 100 records => 151.393102ms
Test 10 records => 22.735926ms
Untrain 50 records => 187.779914ms
Test 10 records => 18.576174ms

Time spent in Memory classifier => 10.057782ms
Time spent in Redis classifier => 363.12243900000004ms

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Cool. Let's keep an eye on it. I think after #97 its pretty good.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

I have added some more data in the previous comment to indicate how much accumulative time the integration test took individually in both the backends. It looks like Memory is about 40 times faster than Redis, but obviously the benefit of Redis is in persistence and collaboration.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

not terribly surprising. We may want to document that.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

not terribly surprising. We may want to document that.

With #98 it is not only documented, but reproducible as well. By documenting if you mean we should add it in the README file then yes, we can add a summarized note there.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Yeah, I meant the readme. I plan to restructure the readme to make it a bit more cohesive soon, so any note will work.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

Yeah, I meant the readme. I plan to restructure the readme to make it a bit more cohesive soon, so any note will work.

I have added brief note about the performance of the Redis backend in PR #103. However, I agree that the README needs some restructure with more headings and sub-headings to reorganize the information that was accumulated over the years.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

Yeah, the readme is a bit of a mess, but shouldn't be too hard to reorganize. I'll probably throw up a quick github page or something similar to help with discovery as well.

from classifier-reborn.

ibnesayeed avatar ibnesayeed commented on May 31, 2024

I'll probably throw up a quick github page or something similar to help with discovery as well.

This is a rather good idea actually. We can just use the /docs folder to keep the static documentation files and configure the repo to automatically serve static pages from the /docs folder of the master branch. This way the documentation will be maintained along with the code in the master branch.

from classifier-reborn.

Ch4s3 avatar Ch4s3 commented on May 31, 2024

That sounds good.

from classifier-reborn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.