Each time my particular test executable initializes, it scans a directory tree to auto

FWIW this is relevant for all kinds of tests where code before <code class="notranslat

.. also relevant, parsing stdio / <code class="notran

If we're doing batches in the main I'd prefer that this was properly supported

Feature request: batching of tests about gtest-parallel HOT 22 CLOSED

google commented on July 27, 2024

Feature request: batching of tests

from gtest-parallel.

Comments (22)

pbos commented on July 27, 2024

Ouch. I first thought --serialize_test_cases would maybe work for you, but it still executes tests individually, just not concurrently. The way this is designed is that all tests are addressed individually (so they can pass/fail/be rerun individually), as far as I'm aware there's no trivial way to extract pass/fail for individual tests if we don't run them individually.

Is there a way you could run test discovery as a separate step that you store to disk then use something like --test_cache=foo.txt to reuse the already-discovered tests? gtest-parallel supports forwarding additional arguments after --, so in that case you could run: ./gtest-parallel my_binary -- --test_cache=foo.txt. Hopefully that would be a lot faster than scanning the directory tree, and if that makes startup time low maybe it's even faster than batching.

I don't see a good way to implement batching with how this works. We're interested in individual test runtimes and pass/fail results, so without a way to extract those from a run I'm not sure how to do batching well at all.

from gtest-parallel.

pbos commented on July 27, 2024

FWIW this is relevant for all kinds of tests where code before RUN_ALL_TESTS() has significant overhead, or when process startup time is significant (I'm looking at you Windows). I wish we had a better answer but I'm not aware of any easy ways to accomplish it.

from gtest-parallel.

pbos commented on July 27, 2024

.. also relevant, parsing stdio / stderr adds non-trivial processing cost, especially in Python, so streaming test logs into the tool is an option I'd like to stay away from.

from gtest-parallel.

pbos commented on July 27, 2024

I guess we could consider any failure within the batch a failure for all of them, and let runtime be 1/N and just document those caveats. We could rerun tests in failing batches as individual tasks. I'm not sure how to easily restructure the code to accommodate for this, let me know if the two-pass solution works for you meanwhile (it should provide better or at least comparable speedup for you, since it's still fewer directory scans).

from gtest-parallel.

thelema commented on July 27, 2024

If we're not going to get individual times for each batch, then what I crossed out in my edit applies as a better way (than 1/N assumption) to handle only having aggregate runtimes. The two pass solution sounds too good for me; I'd be fine with just having batch failures with logs captured as a whole to work on.

Doesn't gtest have alternate output modes where it can output test results as json/xml/something? Is that an option for getting detailed results from a batch?

from gtest-parallel.

pbos commented on July 27, 2024

If we're doing batches in the main script I'd prefer that this was properly supported (including recording test times) to avoid surprising users, and it would be nice if we supported retries for failing tests in batches. I'm not too keen on recording batch times since that information is hard to use. I haven't found other output modes that are compiled in by default, there are C++ test event handlers that could emit this, but they have to be registered from within the binary so that doesn't work (as we want this to work with all gtest binaries). If you do find output options let me know, that would make this a viable option.

I'm not entirely sure that batch runs should be part of the "main" script if we can't get any timing data or test results out of it, since a lot of logic in there is based around that. It could be doable but that'd require refactoring that makes the components inside the script easier to reuse.

What you'd need for the batch case is probably: (1) Extract all test cases, (2) shuffle them just to hopefully avoid a long tail, (3) split into N-sized chunks and (4) running the chunks in parallel with X workers. Maybe (1) and (4) could be broken out into components from the script to permit easier building of a batch runner, or have (2) be an injectable component that normally sorts by recorded runtimes. I'm just spitballing here, this looks like non-zero work and I'm mostly concerned about maintainability here.

from gtest-parallel.

pbos commented on July 27, 2024

There's a --gtest_output=xml:filename flag which sounds interesting.

from gtest-parallel.

thelema commented on July 27, 2024

https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#generating-an-xml-report

theory confirmed; it wouldn't be unreasonable to dump test results to XML and get individual test times (and results) from this.

from gtest-parallel.

pbos commented on July 27, 2024

It would also help deduce binary startup time even in the single-test case, which could be very powerful, and possibly a good first step to implementing batching.

from gtest-parallel.

thelema commented on July 27, 2024

I admit that adding batch support will have nonzero cost, but it seems that the current design allows this cost to be restricted to Task and TaskManager. AFAICT, The best way to start this change is to extend Task to be able to encompass a batch of tests. Once BatchTask and TaskManager can properly coordinate to behave properly on a group of tests, injecting a smarter BatchTask creation routine seems non-invasive.

from gtest-parallel.

thelema commented on July 27, 2024

Binary startup time could be usually deduced from the time it takes to run <executable> --gtest_filter=. But I don't know what could be done with this other than automatically recommending batching or reporting the total amount of CPU time burned starting each task in a separate process.

from gtest-parallel.

thelema commented on July 27, 2024

Just for the record, I used --gtest_filter= to measure my startup cost and saw between 0.4 and 1.0 seconds of startup time. Across ~3500 tests, this means I'm burning 29 minutes of CPU time (actually, mostly disk-I/O, which is even more precious on the system I'm working on) in order to run all tests

from gtest-parallel.

pbos commented on July 27, 2024

Note that your best runtime is probably still if you can save the list of tests to disk and then load that, but it's less automatic:

./my_tests --dump_test_list=foo.txt && gtest-parallel ./my_tests -- --use_test_list=foo.txt.

Good sharding is especially important if you have long-running tests as putting two long-running tasks in the same bucket can double total runtimes.

If you're doing automatic batching you want to add enough small tests to not have their runtime dwarfed by startup time. If we're going for a more automatic approach I think it'd be reasonable to base the batch sizes on startup time (and have a target batch time based on it, say 10% of task time should be startup time).

I'd probably want to treat failing or new tests as having infinite runtime and probably run them alone in a batch, because the cost of accidentally running two long-running tests in a bucket is huge and unpredictable (it can 2x the total runtime).

from gtest-parallel.

thelema commented on July 27, 2024

You're right; measuring the startup time can allow automatic determination of most efficient batch size. I didn't think about this before. But if the goal is to optimize for minimum total runtime, the best strategy is to take the number of workers as the number of bins and pack tasks into that many bins so that the largest bin is as small as possible. (corresponding to fastest completion time). I don't think I expected to go that far, but looking at it now, this seems the logical conclusion: only pay the startup cost once per worker. Of course this breaks down a bit if there's uncertainty in how long jobs can take, and handling singleton jobs in this system isn't obvious.

Having a fixed target (<10% batch time used for startup) might lead to inefficient use of workers when startup is long and tasks are fast, as the number of batches might be less than the number of workers, but I imagine that's fixable.

I completely agree on running tests with unknown runtime by themselves.

from gtest-parallel.

pbos commented on July 27, 2024

Imo batching should be done on the fly, so if some tests fail they can be repeated without repeating the batch, so I'd avoid BatchTask, but maybe the runner can fetch more than one task and be responsible for executing all of those. This should still be a queue of workers that whenever they are free should fetch X tasks from the queue (instead of 1 task), but X should not be ridiculously large.

I think the tasks should still be single tests but the task runner should be able to fetch X tests (trying to target Y ms, where Y ms is based on startup cost). I think we should pay X% overhead and not pack more into a single worker, these test times are often highly variable and you end up with a long tail on a single worker if you get this wrong.

.. a first version could just have workers get --batch-size=N, N tasks from the task queue and do those. If you want this sharded better, then get the slowest task and the N-1 fastest tasks from the queue.

from gtest-parallel.

thelema commented on July 27, 2024

I like the strategy of batching on the fly even better, although at the moment it's Task that takes care of return code and time, and that would have to be changed for multiple tests in one execution. I'll start thinking about how to move that out of Task to the TaskManager, but I suspect there's not enough left in Task if we do this, and a better division of work would still need BatchTask with different behavior on failure.

from gtest-parallel.

pbos commented on July 27, 2024

I think it makes more sense to move things out of Task and have that more be a data object, maybe the task runner can print logs instead of the tasks themselves and the result of a Task can either be a batch failure (if the program crashes), individual crash (if batch size == 1 + crash), individual failure (with runtime) or success (with runtime).

A crash is probably when we get a non-zero exit code and no xml output, or when the test has no XML output. This probably needs to be tested, I'm not sure if there's any exception / signal handler in gtest that would still generate output if there's a process crash. We also need to make sure that stdio/stderr output looks similar when the xml output flag is specified, so that the error message isn't hard to parse.

I think it'd make sense for the worker to fetch N tasks (based on expected runtime and number of inactive workers) instead of transforming them into BatchTask instances.

from gtest-parallel.

thelema commented on July 27, 2024

I think we're largely in agreement; the worker fetches N tests and runs them as a Task. This Task handles the running of many tests and presenting the result of this execution as you described.

from gtest-parallel.

thelema commented on July 27, 2024

Just to record one more detail; when starting multiple tasks (such as at the beginning of a run), if all tasks can be simultaneously run (possibly in batches), these batches should try to have equal runtime to minimize the time to completion. For example, if there's 20 processors and 20 tests, the tests should all be run in parallel instead of all in a single batch.

Maybe this can be done by a second pass after greedy grouping of tests into tasks before the tasks are started. This second pass could move tests from the biggest task to the biggest below-average-rate task that doesn't cause it to exceed average-rate until no move is possible. (i.e. take the smallest test in the biggest task and move it to some task that won't become too big).

from gtest-parallel.

pbos commented on July 27, 2024

After thinking about this some more I'm feeling like this change is likely to make the code a lot more complicated / harder to maintain (I'll gladly be proven wrong here but I'm not inclined to add a lot more complexity for this case). I feel like the best way of supporting this is usecase is still to separate the test scanning from test execution as follows:

make test_binary && ./test_binary --dump_test_list=test_cache.txt && gtest-parallel ./test_binary -- --test_list=test_cache.txt

This could of course be placed in your own shell script or similar to make sure you don't have to keep these parameters in mind. The overhead for parsing the test_cache here is hopefully somewhat negligible. It could also be possible to statically link or otherwise compile the test list into your binary.

Otherwise if this isn't an acceptable solution for you, I just wanted to give a heads up that I'm expecting this to be complicating and hard to get right, and as such it might be hard to get landed. Do feel free to maintain your own branch if you can get it to work well enough for your usecase though.

from gtest-parallel.

null77 commented on July 27, 2024

Just FYI I ran my project's simplest test suite through gtest-parallel and saw a 10x performance penalty on a 4-thread laptop. For test suites with expensive init this could be another several times slower.

I might look into a similar C++ harness for best performance. But batching will likely be useful. Even given stdio parsing and long command lines it might have a 10x speed improvement or more.

from gtest-parallel.

pbos commented on July 27, 2024

Did you end up profiling any of those runs? This tool assumes that spawning processes is fast-ish, test setup is cheap-ish and that tests are not fully utilizing cores. It's (for obvious reasons) incredibly effective for code / tests that sleep(). Tests that don't fall under above assumptions are not likely going to benefit as much (if any, as in your case) from this tool.

Batching's not likely going to happen unless there's a way to do so without adding a lot of code complexity. You can for sure make a batching runner that might suit your project better. This tool's not intending to be a silver bullet for all projects, but we've found it to reduce total runtime in multiple large-scale projects.

from gtest-parallel.

Feature request: batching of tests about gtest-parallel HOT 22 CLOSED

Comments (22)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent