kattis / problem-package-format Goto Github PK
View Code? Open in Web Editor NEWKattis problem package format specification
Kattis problem package format specification
Should name
actually be required?
We can (and do) get the name from \problemname
in LaTeX. We could do something similar with Markdown.
How should we handle when the problem name is specified in both problem statement and problem.yaml
?
Do we want it the way it's currently specified.
The problem format forces a set of allowable licenses. Nice in practice, but does it belong there?
Doesn't have to be exhaustive, but should mention the basics:
.interaction
files for samples and its syntax.in
file. For BAPCtools I was assuming this, but it seems that at least @simonlindholm thinks otherwise. I agree that not putting a .in
file makes sense.attachments/1.in
.validation: custom interactive
for the problem.yaml
\begin{Interaction} ... \end{Interaction}
for problem statementE.g. reusing numbers in the statement and a validator.
Suggested by @jsannemo.
Introduce something like problem_format_version
in problem.yaml
.
This would be a required going forward, but tools could (and likely will) treat missing problem_format_version
as a "compatibility mode".
(See also discussions in #15)
Tooling is needed that will prepare problem packages for DOMjudge. It will need to:
time_limit
accordinglytestdata.yaml
and convert them into a form DOMjudge understandsOther stuff?
I’ve tried to understand which validators are run and when. This is particularly interesting with respect to input validation.
There are several sources of information about this, in particular “the specification” (which exists in many different forms), and two toolchains that are inspired by the specification.
To highlight the issue, there is currently no consensus about what the following entry in testdata.yaml
means:
input_validator_flags: foo
But there are more complicated issues as well. Here follows an attempt at an overview.
Before the deep dive, here is an attempt at a crisp summary:
input_validator_flags
or input_validators
? Henceforce I’ll call it iv
.iv
when it is a string? Is it a string of flags, such as --max_n 10 --connected
or the name of a validator such as grammar.ctd
?iv
is no a string, then what is it? Can it be a list of dicts?name
key of iv
(or, if it’s a list, in any of its entries), does that mean that only the listed validators are run on the testcases in the given testgroup?secret
and secret/foo
both have iv
keys in their testdata.yaml
, how is inheritance handled? If secret
has an testdata.yaml/iv
key but secret/foo
has testdata.yaml
but no testdata.yaml/iv
then does secret/foo
inherit the values from its parents? Does a flag for validator grammar.py
specified in secret
apply to secret/foo
? Does that depend on if secret/foo
has testdata.yaml
, testdata.yaml/iv
or mentions grammar.py
in testdata.yaml/iv
.The most important issue is 2, because it breaks a core functionality it makes two toolchains (problemtools and BAPCtools) incompatible.
What does the specification say? At https://www.kattis.com/problem-package-format/spec/problem_package_format is says:
Key
input_validator_flags
Type String or map with the keys "name" and "flags".
Default empty string.
Comments arguments passed to the input validator for this test data group.If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.
However, I have learned that this is not the intended mode of reading. The only source for the intended meaning is currently the source code, which is here:
input\_validator<s class="dep kattis">\_flags</s></s> |
String or map with the keys "name" and "flags"</s> |
empty string |
arguments passed to the input validator for this test data group.</s> If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.</s> |
The <s>
tags seem to be unbalanced, but one way to make sense of this are the two readings:
Key
input_validator_flags
|
Type: String or map with the keysname
andflags
Default empty string
Comments: arguments passed to the input validator for this test data group
and
class "dep kattis"
, presumably meaning “relevant to Kattis, deprectated”
Key
input_validator
|
Type: String or map with the keysname
andflags
Default empty string
Comments: arguments passed to the input validator for this test data group. If a string this is the name of the input validator that will be used for this test data group. If a map then this is the name as well as the flags that will be passed to the input validator.
However, I’m a bit out of my depth of how to interpret the tags.
input_validator_flags: foo
The most important thing for me to align what input_validator_flag: foo
in data/a/b/testdata.yaml
should mean. (This feature is extremely useful and widely used.) It should have high priority that there exists a nonzero number of accessible and authoritative sources that specify the name and intended functionality of this flag.
I believe the current implementation of problemtools
input_validators
on all test-cases andfoo
as an argument to all(?) those validators for the testcases in data/a/b
.BAPCtools does something else, it
foo
on the testcases in data/a/b
.The three traditions (specification, problemtools, BAPCtools) further disagree on inheritance, having very different opinions on what to do when data/a/testdata.yaml
exists and sets input_validator_flag: bar
data/a/b/testdata.yaml
exists but does not contain input_validator_flags
. Specification says: bar
not set. Problemtools: travels upwards in settings tree and finds bar
.data/a/b/testdata.yaml
does not exist. Should bar
, when interpreted as a validator, run? As the only validator? Or is the absence of a specified validator at b
an indication that all validators should run?The specification says:
All input validators provided will be run on every input file.
Alas, the semantics of the word “provided” is not clear from the rest of the specification. It may mean “all input validators found in input_validators
”. Or, the named validators somewhere in testdata.yaml
settings files can restrict or extend which validators are “provided”.
The specification says:
In each test data group, a file
testdata.yaml
may be placed to specify how the result of the test data group should be computed. If such a file is not provided for a test data group then the settings for the parent group will be used.
I don’t think that is what is implemented by Problemtools (but it is implemented by BAPCtools). Suggestions:
(i) slightly better as “…if such a file a setting is not provided for a test data group then the…”.
(ii) slightly better as “… to specify settings for a test data group, such as grading or validation”.
input_validator_flags
/ input_validator(s)
This should have low priority.
What BAPCtool seems to implement is this:
Key
input_validator_flags
Type String or map with the keysname
andflags
or nonempty list of such maps.
Default empty string.
Comments If a string this is the name of the input validator that will be used for this test data group; no other validators are run. If a map then this is the name as well as the flags that will be passed to the named input validator; no other validators are run. If a list then exactly the named validators in the list are run, with the given flags.
(What is new is the “list” type.) This may indeed be the intended definition of the speculative part of the specification. However, the inheritance rules are very unclear to me; there is no agreement on whether keys in testdata.yaml
files are inherited, much less about what happens when those keys are lists of dicts.
If a more granular set of credits should be supported, then what should be supported?
See also RagnarGrootKoerkamp/BAPCtools#295
I thought this had been the case at some point. It's something I was bitten by recently in Coding Cup, having a solution that was incorrect (in a crashy way), but which sometimes manifested as a WA and sometimes as a TLE.
TLE also allows WA since we only want to verify that /some/ test case times it out, but the above case doesn't seem to be clearly mappable in the spec right now?
Especially important when the program is a directory
We way
The author must be contacted for every additional use.
I think we mean rights_owner
. Right?
This is just a place to list some issues that were created in the BAPCtools repo. Mostly regarding the generators.yaml spec that will need to be updated accordingly (#2).
RagnarGrootKoerkamp/BAPCtools#293
inheritance of testdata.yaml
in generators.yaml
RagnarGrootKoerkamp/BAPCtools#283
All testdata.yaml
must be specified in generators.yaml
RagnarGrootKoerkamp/BAPCtools#286
terminology around generators.yaml
RagnarGrootKoerkamp/BAPCtools#271
Allow generating multiple testcases at once
RagnarGrootKoerkamp/BAPCtools#272
Allow hardcoding testcase input/answer in the generators.yaml
RagnarGrootKoerkamp/BAPCtools#262
Support generating include
d testcases.
RagnarGrootKoerkamp/BAPCtools#274
Testcase numbering: there is a single global scope for all testcase numbers. Testgroups are numbered separately within each directory.
RagnarGrootKoerkamp/BAPCtools#273
Deprecate support for {name}
in commands. Instead, they should simple read from/write to testcase.{in,ans,png,...}
in the current working directory. ({name}
is substituted by testcase
for backwards compatibility)
RagnarGrootKoerkamp/BAPCtools#259
fix documentation around input_validator_flags
RagnarGrootKoerkamp/BAPCtools#256
Adds BAPCtools support for i18n; should be compatible with the spec as-is. Not actionable; just FYI.
Testcases should have problem-unique names. Everything would be simpler.
The formatting on kattis.com/problem-package-format/ is not working.
Also, the markup for not-implemented is broken for input_validator_flags
, not showing what is not yet implemented. Looks like some tags that should be ?
Allow specifying of test data that must be failed by some input validator.
Suggested by @jsannemo.
Suggested by @jsannemo.
Maybe something like:
command line arguments are given in a file called <test case>.args
, files available are put in a directory called <test case>.files
.
The current specification starts by esoterica about what the entry point to a prolog
problem is, then explains problem metadata such as rules for source_url
, licensing, and what the systems defaults for compilation_memory
are. Halfway down the text(!) we finally get to problem_statement
.
Apart from this (and related) mistakes for ordering a document aimed at humans, the documentation also pays undue weight to issues such as custom output validation or grading, which are used in very few problems but take up a lot of room.
Instead, there should be a way to read the documentation “breadth-first”, so that a relatively short document completely explains the relevant parts for 90% of the problems, enough for somebody who wants to make their first own problem: pass-fail, default output validation with default settings.
I’ve given this some thought, and come up with the structure laid out in https://problem-package-format-copy.readthedocs.io/en/latest/index.html.
In particular, the section Core Problem Package Format contains everything you need to make “Add two numbers”, with a running example. I think that core section is quite good by now.
How to cut the cake is somewhat arbitrary, but in my outline the core section is followed by more advanced concepts, which are
testdata.yaml
, manipulating time_limit
and the default output validator.testdata.yaml
and grading.There’s an attempt at adding useful examples, terminological stringency, and even a budding glossary.
The details are up in the air, but I want something like this. I need it (because I teach a course and need better documentation than what we have right now), so I’ll maintain it anyway, but for me the best outcome would be to move the documentation to such a structure that explains common things first and esoterica later.
Please discuss.
Currently, the name of the problem directory is most often called short_name
. This is confusing because many things have names (validators, test cases, submissions), so “the short name” is not very informative.
I think better suggestions are
problem_id
. I prefer this, it can also appear in body text as ”the problem ID”. Short enough to be used in longer strings such as <problem_id>/data/sample
. Also separates the concept “name” (like “Hello World!”, “Bonjour le Monde!”) from the concept ID (like hello
).problem_short_name
. This would necessity “the problem’s short name problem_short_name
”. Worse because listgame2
is not a short form of the name “A Different List Game”.Honest question. The specification is silent about when a test case must provide .ans
.
validation: default
it is true that if <base_name>.in
exists then <base_name>.ans
must also exist. We should say that.validation: custom
. This is highly implicit, but since judge_answer
in the specification needs a value, the adventurous reader can guess that this value was taken to be <base_name>.ans
. I’m not sure, though.validation: interactive: True
there need be no such file in data/sample
Is the rule “whenever an input file .in
exists then a default answer file .ans
must also exist”? (I think so.)
The scoring/objective key in problem.yaml
defaults to max
and is always never anything else.
In theory a minimization problem could always be implemented as a maximization problem.
We should drop this setting.
The only problems currently on Kattis using minimization are:
https://baylor.kattis.com/problems/baylor.linearregression
https://baylor.kattis.com/problems/baylor.logisticregression
https://baylor.kattis.com/problems/baylor.gradientdescent
https://kth.kattis.com/problems/kth.adk.castingheuristic
The following languages are supported by Kattis, but not listed among the languages. It would be good to standardize their codes, extensions etc:
APL
Bash
COBOL
Dart
F#
Fortran
Gerbil
Julia
TypeScript
Visual Basic
@niemela could you take point on that?
The types of keywords
and languages
are String or sequence of strings
, but then there is text
Languages
A space separated list of language code from the table in the overview section or all.
If a list is given, the problem may only be solved using those languages.
'space separated list' is confusing in itself.
I've never used either of these so I don't know what is currently assumed. Either is fine, but we should be clearer.
(I think I have a slight preference for splitting a string on spaces, but an actual YAML list of keywords/languages is also good.)
note: sequence
is YAML terminology for a list object
Python 2 has now been EOL:ed for quite some time. Looking at e.g. NCPC 2020 Python 3 is >10x more popular than Python 2, and I expect this has only increased since.
I think it's time for the default Python version in the spec to be Python 3.
This is backwards-incompatible, but easily fixed for old problems: add a shebang on Python 2 code if the validator, submission etc crashes (typically they will have a print blah
call that causes a syntax error).
Thoughts? e.g. @niemela @ghamerly @simonlindholm
Interactive problems have a 'hidden' .in
file that is never seen by contestants/submissions, only by the interactor. Nevertheless we can put a sample.in next to a sample.interaction to run the testcase as a sample, eg for separate judging feedback.
These sample.in files should not (always) be shown to contestants nor be available for download, since they are meaningless.
We often provide a small test interactor in the attachments/ directory, but that may follow a different spec for its input files, incompatible with the sample.in used by the interactor.
What exact LaTeX environment can/must be assumed.
Scoring problems today allow the keys accept_score
and reject_score
for a test group. These set the default score of a test case for which no scoring validator exists.
When a scoring validator exists, they are instead treated as a fallback in case the validator, for some reason, doesn't output a score for a given test case. The number of times this makes sense should be small (we have never encountered it on PO).
I propose that accept_score and reject_score are instead treated as a multiplier to whatever the validator outputs, and that an accept score of 1
and a reject score of 0
is defined to be the default for a scoring problem if the output validator doesn't output anything. This is essentially compatible with the behaviour today, except for the case where a validator only sometimes outputs a score.
The benefit of this change is that for scoring problem where different groups have different maximum scores, a validator can be used to denote what percentage of that group's score should be given. This is used every now and then in olympiad problems, where for example constructing a valid solution vs determining whether a solution exists or not can give 100% vs 50% of the score. Today, this requires passing input validator flags per-group (not implemented in Kattis) or writing a custom grader. This seems like a simpler way.
As discussed with @niemela and @simonlindholm
Right now .png
, .jpg
, and .jpeg
are supported file types. However, encoding files of these types yourself is quite difficult due to compression and checksums. I would suggest to add a simpler format (like .bmp
) which allows you to create raster images in any language you want without additional libraries.
Over the past few days, I have edited a parallel document to the specification hosted here. You can check it out at
increment
in /problems/increment
that is included piecemeal in the documentation. There should be more of those (in particular, a scoring problem, but maybe also an interactive problem and a custom pass-fail output validator.)test.sh
validates much of the yaml in the repo against the schemas. There could be more of this (all yaml should be validated, and ideally the complete example problems should be verifyproblemed. This is all doable and easy).I strongly suggest to migrate the documentation in the direction of my suggestion. There are a number of ways to do that
Moreover,
a. We let a github build script render the RST. See https://www.sphinx-doc.org/en/master/tutorial/deploying.html . I have no experience with this but trust that it’s quite easy.
b. We host using Build the Docs. For this, Github user Kattis needs to get a Read the docs account.
I am for 2b but have no strong opinion. But a decision about these choices is important for how to proceed.
Following up on #83 (comment)
As described in #26, ``input_validator_flags: ` was specified to be the name of the input validator to run.
The actual behaviour in problemtools
seem to be to run all validators with the given string as flags.
I implemented some preliminary support for this at some point in BAPCtools
that actually follows the spec, but I don't think that code was ever used. (BAPCtools doesn't properly support testgroups anyway at the moment.)
Can we fix the legacy spec to say that a string argument is a flag passed to all validators?
Consider the following combination:
name: Foo
type: scoring
validation:
scoring: true
The problem (for me) is that the string scoring
means two very different things here (the first is a claim about how grading is done and how the UI presents the result, the other is about the existence and behaviour of a program in output_validators
that produces score.txt
.)
I think this is very unclear to the uninitiated. In fact, the following combination looks very weird to me:
name: Foo
type: scoring
validation:
scoring: false
I think this is a mess anyway. I welcome suggestions for cleaning this up, in particular if they also could decouple the ICPC part from the full specification.
Can type
be specified in icpc subset?
Because we write
Any unknown keys should be treated as an error.
This would make it an error specify
type: pass-fail
in a problem.yaml
that tries to conform to the spec.
The alternative is to enforce an optional key with a singleton value
// icpc spec
type?: "pass-fail"
and allow in the full specification
// full spec
type?: *"pass-fail" | "scoring"
(Update after some head-scratching: this is not possible, at least in CUE, because of [actually good reasons]. See below.)
These things are not the same (because of the quoted passage above).
We should define a place to store solution slides / editorials.
At a minimum in the form of a PDF, but preferably (?) as LaTeX and/or Markdown (i.e. similar to problem statements). This of course introduces more of the #31 issues.
(See also #67)
Currently we identify a problem author by their name as a string. We discussed allowing for a more unique identifier such as email or orcid.
This could look something like this:
author:
- name: Firstname Lastname
- email: [email protected]
- orcid: https://orcid.org/0000-0001-5727-2427
email
and orcid
would be optional, and just a string would be interpreted as the name
.
Thoughts?
Currently there is no way to see which parts of the specification pertain to the ICPC subset short of reading HTML tags in the source. (Sometime, a pair of buttons in the left-hand margin enables this, but it’s fragile, implicit, and not obvious to outsiders.)
legacy
and 2023-07-draft
. (cf. https://ccs-specs.icpc.io/2023-06/#changes-compared-to-the-code2022-07code-version)legacy
page a bitSuggestion: remove show_test_data_groups
and leave how to deal with this to the CCS.
Currently they are specified as
This is basically what you get if you grade with 'priorities', where AC < WA < TLE < RTE
. In practice though, ICPC uses 'first failure' mode.
This means that these directories do not actually correspond to the expected final vedict.
Does anybody implement priority-based verdicts for testing these submissions?
If not, I'd say we should change them to just mean the final verdict (using whatever grader is in place) must be X
Here are some ideas:
// Defines valid contents of testdata.yaml
package problemformat
import "strings"
import "strconv"
#filename: =~ "^[a-zA-Z0-9][a-zA-Z0-9_.-]*[a-zA-Z0-9]$"
#path: =~ "[a-zA-Z0-9_.-/]*"
#testdata_settings: {
on_reject: *"break" | "continue"
grading: *"default" | "custom"
grader_flags: *"" | string
if grading == "default" { grader_flags? : #default_grader_flags }
input_alidator_flags: *"" | string
output_validator_flags: *"" |string
accept_score: *"1" | #score
reject_score: *"0" | #score
range: *"-inf +inf" | string
// Verify that the scores and ranges make sense:
_range_syntax: strings.Split(range, " ") & [#score, #score] // space-separated scores
_lo: strconv.ParseFloat(strings.Split(range, " ")[0], 64) // parses to float (including '-inf')
_hi: strconv.ParseFloat(strings.Split(range, " ")[1], 64)
_wa: strconv.ParseFloat(reject_score, 64)
_ac: strconv.ParseFloat(accept_score, 64)
_order: true & _lo <= _wa & _wa <= _ac & _ac <= _hi
}
// matches "1", "06", "21", ".4", "-1.2", but not 'inf'
#score: =~ "^-?([0-9]+|[0-9]*.[0-9]+)$"
// Default grader
// --------------
#default_grader_flags: this={
string
_as_struct: { for w in strings.Split(this, " ") { (w): _ } } // convert to struct and ...
_as_struct: #valid_default_grader_fields // ... validate its fields
}
// Default grader flags (converted to fields of a CUE struct for validation)
#valid_default_grader_fields: {
#verdict_aggregation_mode? // at most one verdict aggregation mode
#score_aggregation_mode? // at most one score aggregation mode
ignore_sample?: _ // two more optional flags
accept_if_any_accepted?: _
}
#verdict_aggregation_mode: {first_error: _ } | *{worst_error: _ } | {always_accept: _ }
#score_aggregation_mode: {min: _ } | {max: _ } | *{sum: _ } | {avg: _ }
and here’s a specification of problem.yaml
#problem_settings: {
// Should these fields all also accept null? If so, what's the semantics of, say,
// license:
// TODO: typical-problem.yaml violates spec (name missing)
// TODO: maximal-problem.yaml violates spec (name missing, validator should be validator_flags)
name: string | { [string]: string }
type?: *"pass-fail" | "scoring"
author?: string
source?: string
if source != _|_ { source_url?: string } # only allow source_url if source is specified
license?: *"unknown" | "public domain" | "cc0" | "cc by" | "cc by-sa" | "educational" | "permission"
rights_owner?: string
limits?: #limits
validation?: *"default" | #custom_validation
validator_flags?: string
scoring?: {
objective: *"max" | "min"
show_test_data_groups: *false | true
}
keywords?: string | [...string]
uuid?: string
languages?: string | [...string]
}
#custom_validation: this={
string
_as_struct: { for w in strings.Split(this, " ") { (w): _ } }
_as_struct: close({
custom: _, // Must include "custom",
score?: _, // can include "score" ...
interactive?: _ // ... and "interactive"
})
}
#limits: {
// All are optional, right?
// Is the empty limits dictionary valid? (I guess yes)
// TODO: are these all ints?
time_multiplier?: *5 | int
time_safety_margin?: *2 | int
memory?: int
output?: int
code?: int
compilation_time?: int
compilation_memory?: int
validation_time?: int
validation_memory?: int
validation_output?: int
}
We had some problems with randomized solutions where we guarantee n=1000, but that is unpractical in teh samples, so we had those with much smaller n, eg n=10.
It would be nice to officially support samples that are only shown, but not run.
I implemented this with a .{in,ans}.statement
extension, but open to other options.
Are the testdata features rich enough to make these an unnecessary complication?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.