I’ve tried to understand which validators are run and when. This is particularly interesting with respect to input validation.
There are several sources of information about this, in particular “the specification” (which exists in many different forms), and two toolchains that are inspired by the specification.
To highlight the issue, there is currently no consensus about what the following entry in testdata.yaml
means:
input_validator_flags: foo
But there are more complicated issues as well. Here follows an attempt at an overview.
Before the deep dive, here is an attempt at a crisp summary:
- Is the name
input_validator_flags
or input_validators
? Henceforce I’ll call it iv
.
- What is the semantics of the value of
iv
when it is a string? Is it a string of flags, such as --max_n 10 --connected
or the name of a validator such as grammar.ctd
?
- If the value of
iv
is no a string, then what is it? Can it be a list of dicts?
- If a validator is listed in the
name
key of iv
(or, if it’s a list, in any of its entries), does that mean that only the listed validators are run on the testcases in the given testgroup?
- If
secret
and secret/foo
both have iv
keys in their testdata.yaml
, how is inheritance handled? If secret
has an testdata.yaml/iv
key but secret/foo
has testdata.yaml
but no testdata.yaml/iv
then does secret/foo
inherit the values from its parents? Does a flag for validator grammar.py
specified in secret
apply to secret/foo
? Does that depend on if secret/foo
has testdata.yaml
, testdata.yaml/iv
or mentions grammar.py
in testdata.yaml/iv
.
The most important issue is 2, because it breaks a core functionality it makes two toolchains (problemtools and BAPCtools) incompatible.
Specification
What does the specification say? At https://www.kattis.com/problem-package-format/spec/problem_package_format is says:
Key input_validator_flags
Type String or map with the keys "name" and "flags".
Default empty string.
Comments arguments passed to the input validator for this test data group.If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.
However, I have learned that this is not the intended mode of reading. The only source for the intended meaning is currently the source code, which is here:
input\_validator<s class="dep kattis">\_flags</s></s> |
String or map with the keys "name" and "flags"</s> |
empty string |
arguments passed to the input validator for this test data group.</s> If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.</s> |
The <s>
tags seem to be unbalanced, but one way to make sense of this are the two readings:
Key input_validator_flags
|
Type: String or map with the keys name
and flags
Default empty string
Comments: arguments passed to the input validator for this test data group
and
class "dep kattis"
, presumably meaning “relevant to Kattis, deprectated”
Key input_validator
|
Type: String or map with the keys name
and flags
Default empty string
Comments: arguments passed to the input validator for this test data group. If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.
However, I’m a bit out of my depth of how to interpret the tags.
Current behaviour of input_validator_flags: foo
The most important thing for me to align what input_validator_flag: foo
in data/a/b/testdata.yaml
should mean. (This feature is extremely useful and widely used.) It should have high priority that there exists a nonzero number of accessible and authoritative sources that specify the name and intended functionality of this flag.
I believe the current implementation of problemtools
- runs all validators in
input_validators
on all test-cases and
- sends the string
foo
as an argument to all(?) those validators for the testcases in data/a/b
.
BAPCtools does something else, it
- runs validator
foo
on the testcases in data/a/b
.
The three traditions (specification, problemtools, BAPCtools) further disagree on inheritance, having very different opinions on what to do when data/a/testdata.yaml
exists and sets input_validator_flag: bar
data/a/b/testdata.yaml
exists but does not contain input_validator_flags
. Specification says: bar
not set. Problemtools: travels upwards in settings tree and finds bar
.
data/a/b/testdata.yaml
does not exist. Should bar
, when interpreted as a validator, run? As the only validator? Or is the absence of a specified validator at b
an indication that all validators should run?
Provided validators
The specification says:
All input validators provided will be run on every input file.
Alas, the semantics of the word “provided” is not clear from the rest of the specification. It may mean “all input validators found in input_validators
”. Or, the named validators somewhere in testdata.yaml
settings files can restrict or extend which validators are “provided”.
Testdata settings inheritance
The specification says:
In each test data group, a file testdata.yaml
may be placed to specify how the result of the test data group should be computed. If such a file is not provided for a test data group then the settings for the parent group will be used.
I don’t think that is what is implemented by Problemtools (but it is implemented by BAPCtools). Suggestions:
(i) slightly better as “…if such a file a setting is not provided for a test data group then the…”.
(ii) slightly better as “… to specify settings for a test data group, such as grading or validation”.
Speculative behaviour of input_validator_flags
/ input_validator(s)
This should have low priority.
What BAPCtool seems to implement is this:
Key input_validator_flags
Type String or map with the keys name
and flags
or nonempty list of such maps.
Default empty string.
Comments If a string this is the name of the input validator that will be used for this test data group; no other validators are run. If a map then this is the name as well as the flags that will be passed to the named input validator; no other validators are run. If a list then exactly the named validators in the list are run, with the given flags.
(What is new is the “list” type.) This may indeed be the intended definition of the speculative part of the specification. However, the inheritance rules are very unclear to me; there is no agreement on whether keys in testdata.yaml
files are inherited, much less about what happens when those keys are lists of dicts.