It looks like the tests are a bit flaky now and seem to hang from time to time.

Tests are a bit flaky with the new redis backend about classifier-reborn HOT 19 CLOSED

jekyll commented on May 31, 2024

Tests are a bit flaky with the new redis backend

from classifier-reborn.

Comments (19)

ibnesayeed commented on May 31, 2024

I usually test in isolated docker containers and I didn't notice anything unusual, except it is talking longer than memory backend. Additionally, the integration test is taking noticeably longer as it needs to train about 7K instances, classify 1K instances, untrain 2K instances, and classify 1K instances again, both for Memory Redis. If the Redis is not running or cant connect, those tests are skipped. I think the lag is coming from the periodic save operations when Redis tries to log the write operations into dump.rdb file. However, since we don't need persistence on disk, we can disable this behavior. I will give it a try and will see if the speed improves.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

Let try disabling that. If the slower tests are mostly due to the larger test case, I'm fine with that. We aren't running tons of builds back to back, so I think better test quality beats speed here.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

Data size does affect the testing time. I have refactored the code to make it configurable as constants rather than hard coding numbers in the test itself.

TRAINING_SIZE = 4000, TESTING_SIZE = 1000 => About 17 sec.
TRAINING_SIZE = 1000, TESTING_SIZE = 100 => About 4 sec.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

I would much prefer to move the backend initialization and configuration to be run once per test suit rather than for each test case. However, MiniTest does not provide this functionality out of the box. It is possible with custom runner though.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

If you're willing to write a custom runner, I'm open to it. The only reason we use minitest is b/c it was originally done with test unit.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

If you're willing to write a custom runner, I'm open to it. The only reason we use minitest is b/c it was originally done with test unit.

I am fairly new to actually writing tests, so I will have to understand the dynamics of what is involved and how would that custom runner work. Test unit had various lifecycle callback methods including before and after each test case and suit, but for some reason Minitest decided not to have them available. I initially tried writing an initialize method in my test class, but it turned out that it was going to be called for each test case, so it was not going to help. However, we can perhaps cache the instance variable that holds the reference to Redis backend. On the other hand, I feel like it will be a micro-optimization, because currently the slowest test is the integration one with lots of data.

That said, #97 can still be merged and we can revisit custom runner option later.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

I'll leave this open for now so we can continue to discuss and explore.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

It looks like writing the custom runner will not benefit us in this case. I did some experiments and here are my findings:

Caching instance variables (such as @redis_backend or @classifier) that are defined in the setup method is not going to help because each test case runs as independent instance with its own initialization.
Changing those instance variables to class variables (such as @@redis_backend) would allow caching, but does not improve the over all execution time (evident from the empirical results).
Connecting to the Redis server and creating the @redis_backend instance variable takes less than 0.3ms on my machine (average ≈0.25ms). We are currently running 23 tests that require @redis_backend. We are also creating an @alternate_redis_backend each time (though it is only needed in some tests). This means we are attempting to connect to Redis 23 * 2 = 46 times. The total time would be around 23 * 2 * 0.25 = 11.5ms.
Now, is it really worth the effort? Needless to mention the added complexity it will bring in the test code along with the ugliness of putting data in the class rather than instances.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

Excellent profiling. I'd say based on your results, it's safe to close this.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

The integration test is where all the time is consumed, here are some stats based on different values of training and testing data sizes (time values are sum of Memory and Redis backends for the same task):

TRAINING_SIZE = 4000
TESTING_SIZE = 1000

Load data (tarining: 4000) + (testing: 1000) => 2.80472ms
Train 4000 records => 5790.959149ms
Test 1000 records => 2065.361196ms
Untrain 2000 records => 6426.361111ms
Test 1000 records => 2320.733856ms

Time spent in Memory classifier => 410.059577ms
Time spent in Redis classifier => 16948.311882ms

TRAINING_SIZE = 1000
TESTING_SIZE = 100

Load data (tarining: 1000) + (testing: 100) => 0.760892ms
Train 1000 records => 1522.433455ms
Test 100 records => 200.561411ms
Untrain 500 records => 1623.415373ms
Test 100 records => 193.476481ms

Time spent in Memory classifier => 85.74332ms
Time spent in Redis classifier => 3573.399931ms

TRAINING_SIZE = 100
TESTING_SIZE = 10

Load data (tarining: 100) + (testing: 10) => 0.11450099999999999ms
Train 100 records => 151.393102ms
Test 10 records => 22.735926ms
Untrain 50 records => 187.779914ms
Test 10 records => 18.576174ms

Time spent in Memory classifier => 10.057782ms
Time spent in Redis classifier => 363.12243900000004ms

from classifier-reborn.

Ch4s3 commented on May 31, 2024

Cool. Let's keep an eye on it. I think after #97 its pretty good.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

I have added some more data in the previous comment to indicate how much accumulative time the integration test took individually in both the backends. It looks like Memory is about 40 times faster than Redis, but obviously the benefit of Redis is in persistence and collaboration.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

not terribly surprising. We may want to document that.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

not terribly surprising. We may want to document that.

With #98 it is not only documented, but reproducible as well. By documenting if you mean we should add it in the README file then yes, we can add a summarized note there.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

Yeah, I meant the readme. I plan to restructure the readme to make it a bit more cohesive soon, so any note will work.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

Yeah, I meant the readme. I plan to restructure the readme to make it a bit more cohesive soon, so any note will work.

I have added brief note about the performance of the Redis backend in PR #103. However, I agree that the README needs some restructure with more headings and sub-headings to reorganize the information that was accumulated over the years.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

Yeah, the readme is a bit of a mess, but shouldn't be too hard to reorganize. I'll probably throw up a quick github page or something similar to help with discovery as well.

from classifier-reborn.

ibnesayeed commented on May 31, 2024

I'll probably throw up a quick github page or something similar to help with discovery as well.

This is a rather good idea actually. We can just use the /docs folder to keep the static documentation files and configure the repo to automatically serve static pages from the /docs folder of the master branch. This way the documentation will be maintained along with the code in the master branch.

from classifier-reborn.

Ch4s3 commented on May 31, 2024

That sounds good.

from classifier-reborn.

Tests are a bit flaky with the new redis backend about classifier-reborn HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent