🐛 Describe the bug We are running the test suite <code class="not

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I'm thinking something more along the lines of <a class="issue-link js-issue-link

Summary of tests in machine readable format about pytorch HOT 7 OPEN

Flamefire commented on September 24, 2024

Summary of tests in machine readable format

from pytorch.

Comments (7)

mikaylagawarecki commented on September 24, 2024

cc @clee2000 since you added 3b7d60b

from pytorch.

clee2000 commented on September 24, 2024

I can move the option to save xml into an argument. The paths to the xmls are generated by run_test.py though, is that ok?

from pytorch.

Flamefire commented on September 24, 2024

I can move the option to save xml into an argument. The paths to the xmls are generated by run_test.py though, is that ok?

You mean this code here:

pytorch/torch/testing/_internal/common_utils.py

Lines 968 to 978 in 7aa853a

 def get_report_path(argv=UNITTEST_ARGS, pytest=False): 

 test_filename = sanitize_test_filename(argv[0]) 

 test_report_path = TEST_SAVE_XML + LOG_SUFFIX 

 test_report_path = os.path.join(test_report_path, test_filename) 

 if pytest: 

 test_report_path = test_report_path.replace('python-unittest', 'python-pytest') 

 os.makedirs(test_report_path, exist_ok=True) 

 test_report_path = os.path.join(test_report_path, f"{test_filename}-{os.urandom(8).hex()}.xml") 

 return test_report_path 

 os.makedirs(test_report_path, exist_ok=True) 

 return test_report_path

To me it looks like all XML files go into the folder passed via --save-xml, but might be in subfolders and might have a random name. Unless --log-suffix is used (e.g.

pytorch/torch/testing/_internal/common_utils.py

Line 1100 in 7aa853a

 command = [sys.executable] + argv + [f'--log-suffix=-shard-{i + 1}'] + test_batches[i] 

) where a string is appended to the folder name. Still usable as --save-xml=/tmp/test-reports/report could be used and then all XMLs are in /tmp/test-reports

Main question is: Is that even a feasible approach? I.e. does this work to get all failed tests, especially with the retry logic where I'm not sure if it will skip the "consistently failing" test and still try all others (before and after that failing one)

As those XMLs doesn't seem to used (otherwise it would have been already noticed that it got broken) is there any other/better approach?

from pytorch.

clee2000 commented on September 24, 2024

~~I'm thinking something more along the lines of #126690~~ Ignore my previous comment as well

Have you tried using --save-xml as an additional unittest arg?

Our CI makes xmls and saves them to s3, you can download them on HUD to see what format and file paths they take. There are already a couple of scripts in the repository that parse the xmls to get certain information, you can probably copy a bunch of code from that

It doesn't get all failed tests, there are some cases when xmls don't get made #123882 (but I think the log parsing method you used previously would have also not been able to handle this), but it will handle most retries correctly, you will have to dedup based on test name though.

I guess it's also worth noting that you could parse the logs since all the test names should be there if you use the verbose option?

from pytorch.

Flamefire commented on September 24, 2024

Our CI makes xmls and saves them to s3

Where does that happen, i.e. where are the xmls created?

It doesn't get all failed tests, there are some cases when xmls don't get made #123882

There seem to be 2 issues: Force-Terminated tests and reruns. I'd expect the XML files to contain also the rerun tests even currently as the file names get randomized.

but I think the log parsing method you used previously would have also not been able to handle this

IIRC when pytest-rerunfailures was used (plain, prior to 3b7d60b) then there was still a full summary line for each test file like 2 failed, 128 passed, 2 rerun which we could use.
For force-terminated files we treated it as a hard failure or excluded the test(s)

I guess it's also worth noting that you could parse the logs since all the test names should be there if you use the verbose option?

We actually do that already to be able to print individual failed tests in addition to the summaries (per test file and total). It even allowed to validate that we got all individual failed tests by checking against the summary. But that already turned out to be error-prone due to the different formats. Some examples:

    # === FAIL: test_add_scalar_relu (quantization.core.test_quantized_op.TestQuantizedOps) ===
    # --- ERROR: test_all_to_all_group_cuda (__main__.TestDistBackendWithSpawn) ---
    # FAILED test_ops_gradients.py::TestGradientsCPU::test_fn_grad_linalg_det_singular_cpu_complex128 - [snip]
    # FAILED [22.8699s] test_sparse_csr.py::TestSparseCompressedCPU::test_invalid_input_csr_large_cpu - [snip]

from pytorch.

clee2000 commented on September 24, 2024

The save xml argument is set by default by the CI environment variable, which is true in our CI. I guess the other option would be for you to set it as well, but it also turns on a couple of other things so you'd have to be careful

I'm not sure what a force terminated test is but the case where it doesn't generate xml is when a test segfaults, which leads to xml not getting created and also the === x failed, y skipped === line not getting printed. Then the rerun starts from the segfaulted test so all information about the tests before the segfaulted test gets lost. I can't remember what the previous behavior was, but depending on how far back it was, I'm pretty sure the previous test information still gets lost. Other than this, the reruns should get xml, its just that it'll be in a different file so its a bit more parsing

~~I edited my comment because I realized I understood the usage of the save xml argument incorrectly so I'm not sure if you saw this, but have you tried using --save-xml as an additional unittest arg?~~ nvm this also doesn't work the way I think it does

from pytorch.

Flamefire commented on September 24, 2024

The save xml argument is set by default by the CI environment variable, which is true in our CI. I guess the other option would be for you to set it as well, but it also turns on a couple of other things so you'd have to be careful

Is that export CI=true set by e.g. GitHub? I found

pytorch/test/run_test.py

Lines 993 to 997 in b36e018

 if IS_CI: 

 # Add the option to generate XML test report here as C++ tests 

 # won't go into common_utils 

 test_report_path = get_report_path(pytest=True) 

 pytest_args.extend(["--junit-xml-reruns", test_report_path])

in run_test setting --junit-xml-reruns but the same flag is set in common_utils only when --save-xml is passed

https://github.com/pytorch/pytorch/blob/b36e01801b89a516f4271f796773d5f4b43f1186/torch/testing/_internal/common_utils.py#L1108-L1111

and the former only applies to tests run with pytest. So it looks incomplete to me.

I'm not sure what a force terminated test is but the case where it doesn't generate xml is when a test segfaults, which leads to xml not getting created

Yes I meant test (files) terminated by a signal like SIGSEGV but also SIGIOT (which we have observed)

all information about the tests before the segfaulted test gets lost. I can't remember what the previous behavior was, but depending on how far back it was, I'm pretty sure the previous test information still gets lost.

Yes it was the same here: Most of the information was lost (e.g. that summary line) but some individual tests could be found by parsing the log, but that is fragile, see my previous comment

Other than this, the reruns should get xml, its just that it'll be in a different file so its a bit more parsing

Just be sure I understood this correctly: For segfaulted tests there won't be any xmls even with the new rerun system?

from pytorch.

Summary of tests in machine readable format about pytorch HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def get_report_path(argv=UNITTEST_ARGS, pytest=False):
	test_filename = sanitize_test_filename(argv[0])
	test_report_path = TEST_SAVE_XML + LOG_SUFFIX
	test_report_path = os.path.join(test_report_path, test_filename)
	if pytest:
	test_report_path = test_report_path.replace('python-unittest', 'python-pytest')
	os.makedirs(test_report_path, exist_ok=True)
	test_report_path = os.path.join(test_report_path, f"{test_filename}-{os.urandom(8).hex()}.xml")
	return test_report_path
	os.makedirs(test_report_path, exist_ok=True)
	return test_report_path

	if IS_CI:
	# Add the option to generate XML test report here as C++ tests
	# won't go into common_utils
	test_report_path = get_report_path(pytest=True)
	pytest_args.extend(["--junit-xml-reruns", test_report_path])