korap / koral Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 3.0 2.75 MB

:pencil: Translation of query languages to serialized KoralQuery protocol

License: BSD 2-Clause "Simplified" License

ANTLR 2.17% GAP 1.98% Java 95.85%

koral's People

Contributors

Stargazers

Watchers

Forkers

jimregan ehsan-keshavarzian gremid

koral's Issues

Move foundry/layer/key in spans to term

This is moved from the old Trac Ticket #202

jbingel reported:

Currently, the foundry, layer and key constraints in spans are serialised differently than in tokens. For spans, <mate/c=np> is serialised as

{
  "@type":"korap:span",
  "foundry":"mate",
  "layer":"c",
  "key":"np"
}

whereas for tokens, [mate/p=N] is wrapped in a term:

{
  "@type":"korap:token",
  "wrap": {
    "@type":"korap:term",
    "foundry":"mate",
    "layer":"p",
    "key":"N"
  }
}

There is a reason for this, namely that the constraints in tokens can be combined by ANDs and ORs using a korap:termGroup. That's not possible (yet?) for spans.

A reason against this approach is to minimise the specifications. A korap:span would then have a significantly lower number of attributes, as they're outsourced to korap:term, which can hold the attributes foundry/layer/key already. So the span serialisation would be like this:

{
  "@type":"korap:span",
  "wrap": {
    "@type":"korap:term",
    "foundry":"mate",
    "layer":"c",
    "key":"np"
  }
}

Any more pros/cons before we decide whether to put this into effect?

Maybe we should really support the sort of logical connection for span key that we already have for tokens, i.e. allow something like

<cnx/c=vp | cnx/c=advp>

Then we'd definitely need a similar mechanism as in tokens, naturally using term/termGroup in a "wrap" attribute, liko so:

{
  "@type":"korap:span",
  "wrap": {
    "@type":"korap:termGroup",
    "operation":"operation:or",
    "operands":[ {
      "@type":"korap:term",
      "foundry":"cnx",
      "layer":"c",
      "key":"vp"
    }, {
      "@type":"korap:term",
      "foundry":"cnx",
      "layer":"c",
      "key":"advp"
    } ]
}

Possibly problematic: syntactic ambiguity between attributes (root=true) and layer/key definitions (c=NP).

Akron replied:
But effectively this is identical to <cnx/c=vp>|<cnx/c=advp>, right? So I don't see at least any benefit for Poliqarp+. Regarding spans this might work - although deserialization becomes a bit more complicated (especially with negative matches).

Replace LinkedHashMap

I don't see why requestMap and queryMap (such as in the C2 serialization) should be LinkHashMaps. Does the order of the attributes in KoralQuery important?

Rewrite placeholders in Cosmas-II QL

The Cosmas-II Query "Schiff+ahrt" is correctly interpreted as a placeholder in a wildcard, as described here. However, the placeholder + is not rewritten to ?, which is a failure.

koral:span inconsistency regarding wrap

koral:span may have key, foundry, layer, and value attributes.
In the former specification, these attributes are specified in the same level as type.

{
                "@type": "koral:span",
                "layer": "c",
                "foundry": "corenlp",
                "match": "match:eq",
                "key": "VP"
}

However, in the Koral doc, they should be included in a wrap and specified as a term.

{
   "@type": "koral:span",
    wrap: {
                "@type": "koral:term",
                "layer": "c",
                "foundry": "corenlp",
                "match": "match:eq",
                "key": "VP"
   }
}

Annis: foundry/layer value with regex

Regular expression should be possible for foundry/layer value in Annis query, e.g. tt/l=/A.*/, that is similar to poliqarp query [tt/p="A.*"]. Thus, the serialization should also be the same.

"query": {
    "@type": "koral:token",
    "wrap": {
      "@type": "koral:term",
      "foundry": "tt",
      "key": "A.*",
      "layer": "p",
      "match": "match:eq",
      "type": "type:regex"
    }
  }

Support extended composite operators in Cosmas II

Cosmas II now supports a wide range of operators to find Composites. I don't know if we can support them directly, as these information need to be part of the index (So restricted to Glemm probably).

Annis QL: Serialization of dominance with type

Dominance with type operator should be serialized into a relation (dominance) with attribute (the specified type). Currently it is serialized as an AND relation between the dominance and the type. For example, the dominance operator in this query

corenlp/c="VP" & corenlp/c="NP" & #1 >[malt/d="PP"] #2

is serialized into

 "operation": "operation:relation",
    "relation": {
      "@type": "koral:relation",
      "wrap": {
        "@type": "koral:termGroup",
        "operands": [
          {
            "@type": "koral:term",
            "foundry": "malt",
            "key": "PP",
            "layer": "d",
            "match": "match:eq"
          },
          {
            "@type": "koral:term",
            "layer": "c"
          }
        ],
        "relation": "relation:and"
      }
    }

Annis: lemma keyword

Annis lemma keyword has not been supported yet.

Specify rewrite types

Report types in the Koral specification are not very elaborated. There is no good explanation how to use it and which types of operations are supported.

Kustvakt currently uses:

operation:injection
operation:deletion
operation:override
operation:insertion

In addition, Kalamar also accepts operation:modification.

We should discuss and define the different rewrite operations here and how they should be integrated in KoralQuery.

The scope was originally introduced for deleted items, but is used for other types and comments now. It should be defined, what is meant to be in scope.

This is a reaction to KorAP/Kalamar#51.

Rename "collection"

Currently we use the term collection to describe the construction query of a virtual corpus - for historical reasons, when we used the term "virtual collection". I think it's better to rename collection to corpus, as a collection is a very general term to describe a set (so it could also name a collection of matches, a collection of spans etc.).

Command line interface

The command line implementation seems to be disabled at the moment.
The command

java -jar target/Koral-0.25.jar "der alte" "Poliqarp"

results in nothing.

Distance with optionality

Currently Koral transforms sequences with intermediate any tokens into distances, like der [] Mann into distance(+1, der, Mann). This works, and is fine with Krill. Unfortunately Koral does the same for distances with optional anchors, like der [] Mann?. The Problem here is, that with a distance, the meaning is altered. Without a distance, the query means: "Find a sequence of two tokens, with the first being 'der', followed by 'Mann'". With a distance the query means "Find a span with a single token distance between 'der' and 'Mann', where 'Mann' is optional". I think, in case of optionality on one or both sides, the any token shouldn't be rewritten to a distance..

Support Certainty/Confidence attributes in Queries

Some annotations have associated confidence values (e.g. Treetagger). Krill supports attached confidence values for terms, spans and relations encoded as a byte (values between 0 and 255). There should be a mechanism in KoraQuery to constrain matches to minimum confidence values. This constraint should also be expressible in Poliqarp+.

Proposal:

[cnx/p=NN@!>=70%]

This would constrain matches of cnx/p=NN to all terms with a confidence greater than 70%. The @ symbol introduces attributes to terms/spans/relations, normally written as key-value pairs. The ! would mark a special attribute name for confidence. This may only be a shortcut for a more elaborate attribute name.

Do not generate empty collection objects

Currently Koral will always create KoralQuery objects with "collection" : {}, in case no collection was set. This is not a valid collection object and will make all queries of that type fail.
Another problem are empty warnings, errors and messages. They are not wrong but annoying in case no error, warning or message was set.

Handle queries gracefully

Often Koral blames the user making a mistake, while the query is just confusing but not wrong. Here we should collect queries that are failing and that fail in weird way:

Frage/1 in Poliqarp

Support negation of MORPH() classes in Cosmas II

The MORPH() operator in Cosmas-II supports negation, to exclude certain morphosyntactical annoations from the result set, like MORPH(mate/p=ADV).
Currently this throws a parsing error in Koral.

C2 OP IN

wegen #IN <s>
is serialized into:

{
    "@context": "http://korap.ids-mannheim.de/ns/koral/0.3/context.jsonld",
    "query": {
        "classRefCheck": ["classRefCheck:includes"],
        "operation": "operation:class",
        "operands": [{
            "operation": "operation:position",
            "frames": [],
            "operands": [
                {
                    "operation": "operation:class",
                    "operands": [{
                        "wrap": {
                            "@type": "koral:term",
                            "layer": "orth",
                            "match": "match:eq",
                            "key": "wegen"
                        },
                        "@type": "koral:token"
                    }],
                    "@type": "koral:group",
                    "classOut": 130
                },
                {
                    "operation": "operation:class",
                    "operands": [{
                        "wrap": {
                            "@type": "koral:term",
                            "key": "s"
                        },
                        "@type": "koral:span"
                    }],
                    "@type": "koral:group",
                    "classOut": 129
                }
            ],
            "@type": "koral:group"
        }],
        "@type": "koral:group",
        "classOut": 131,
        "classIn": [
            129,
            130
        ]
    }
}

operation:position with empty frames does not really make sense. This should possibly be reduced to operation:class only, but it does not allow multiple operands.

Besides, wegen #IN(%) <s> should be serialized with "classRefCheck": ["classRefCheck:disjoint"].

operation:position and exclude

While implementing a draft for the exclude option in operation:position, I had the assumption that the Koral description

If true, negate positional relations.

is not correct. In fact, I would expect, in case someone searches for, e.g., containsNot(<dereko/s=s>, [orth=Baum]), it is not only a flip of the frames, meaning it should find a <dereko/s=s> that contains something that is not [orth=Baum], but it should find <dereko/s=s>, that does not contain [orth=Baum].

In case we agree on that interpretation, I would say, this should be a different operation operation:exclusion, also with frames, but with the result being the span of the first operand only.

Introduce docGroupRef for defined VCs

To make working with defined VCs (as implemented by @margaretha) more flexible, I propose an additional corpus/collection object type koral:docGroupRef, that references a defined VC in the corpus/collection query.

{
  "@type" : "koral:docGroupRef",
  "ref" : "https://korap.ids-mannheim.de/@ndiewald/MyCorpus"
}

The reference is a unique ID that will be resolved by the KoralQuery consumer (in our case Kustvakt) - by injecting the requested KoralQuery fragment in the corpus definition (or by injecting a stored ID, in case the VC is persistant in the backend as a list of text IDs, see our discussion on dynamic vs. persistant VCs). Therefore defined VCs can also be part of more complex VCs.

In the frontend's VC builder the reference can be shown and created like "referTo @ndiewald/MyCorpus" along with a ...-symbol (in addition to x, and and or). When the user clicks on ... the reference is resolved and shown in the frontend.

In the Koral VC language we would need to introduce a syntax like referTo {URL}.

Update: Removed desc and type from the example, and replaced @ref with ref after comments by @margaretha.

Serialization of Spans in Cosmas-II

Currently #ELEM(base/s=s) is serialized as

"query": {
    "@type": "koral:span",
    "attr": {
      "@type": "koral:term",
      "foundry": "base",
      "key": "s",
      "layer": "s",
      "match": "match:eq"
    }
  }

instead of

 "query": {
    "@type": "koral:span",
    "wrap": {
      "@type": "koral:term",
      "foundry": "base",
      "key": "s",
      "layer": "s"
    }
  }

That's pretty bad, and means all #ELEM() queries fail.

Support Variables in Poliqarp-Queries

This issue was moved from Trac ticket #204

Akron reported:

Elena pointed me to the use of variables in Poliqarp, I wasn't aware of. It's not really easy to support them, but we could try, in case we find some time ... ;)
This has no high priority - I think we are quite busy with our current tasks - but it should be documented here.

Variables work pretty much like captures in PCRE, and are used for things like agreement.

Specification of grammatical classes and grammatical categories may contain variables (having the form $n, where n is a single digit), whose values will be set only during execution of the query. For example, the following query for an adjective and a following noun agreeing in case:

    [case=nom & pos=adj] [case=nom & pos=subst] | [case=gen & pos=adj] [case=gen & pos=subst] |
    [case=dat & pos=adj] [case=dat & pos=subst] | [case=acc & pos=adj] [case=acc & pos=subst] |
    [case=inst & pos=adj] [case=inst & pos=subst] | [case=loc & pos=adj] [case=loc & pos=subst] |
    [case=voc & pos=adj] [case=voc & pos=subst]

can be simplified to:

    [case=$1 & pos=adj] [case=$1 & pos=subst]

Source: (http://nkjp.pl/poliqarp/help/ense3.html#x4-90003.4)

How could we achieve this? Here's a simple idea:
We serialize the case to a korap:term with "type":"type:reference" (as this works pretty much like class references - but not refering to the same span but to the same surface). The term will have a number for key.

In the Lucene index we would need a SpanCapturedTermQuery? (Or a regex query in a way), that searches for "foundry/layer=case:/.+?/" and captures the term as a variable in a computed payload (like 1#case:acc - however, we have to keep in mind that we need to have a minimal payload length here to not interfere with other payloads). Now - we need "somewhere"(tm) a wrapper around the query, checking that there are no terms set in the payloads that are contradictory, i.e. if we have a reference term "1#case:acc" and another reference "1#case:acc" everyhing is fine - but the span is no match if we have a reference "1#case:acc" and a reference "1#case:dat".

The wrapper could wrap the query as a whole (cheap solution) or could be a bit more clever - wrapped around the subquery that contains all references, e.g. in "contains(<s>, [case=$1][case=$2])" the SpanTermReferenceCheck? could be wrapped around the sequence, so the results could be filtered before the costly SpanWithinQuery? would have to deal with it.

What do you think?

jbingel replied:

Interesting, I wasn't aware of this either. It's not in the official Poliqarp documentation.

This strongly resembles the "equalvalue" (==) operator in Annis. This was introduced in a fairly recent version of the language (3.1.6) and we decided not to support it because of its recency and technical challenge. Now, I wanted to read about it again, but the detailed explanation that was given in the Annis 3.1.6 documentation disappeared in the current 3.2.2 documentation. Oddly, the 3.1.6 documentation has been taken off the net and is not included in the release. Aargh!

Anyway, given Poliqarp variables and AQL "(not)equalvalue" have the same behaviour, it might be worthwhile to support them. Nils' suggestion sounds relatively easy on the serialisation level, so I'm up for it. We'd just make sure to be able to express negative equality, too (e.g. [case=$1][case!=$1]), but that could probably done in the usual way, i.e. using "match:ne", right?

Akron replied:
Damn - negation! I didn't think about that ... my solution doesn't cover that. I would have to think about it further. But for CoralQuery? I guess it's fine to use match:ne in that case.
Please report your Annis findings here in the ticket. I can't remember it correctly.

Document error codes

The error codes (in the 3xx range) should be documented and localizable. There is a status code map (de.ids_mannheim.korap.query.serialize.util), but for every error there is a different error message. Error messages should be identical for each error code and further information (character position etc.) should be added in the KoralQuery error messages in a position > 2.

Wrong Serialization of #ELEM() in Cosmas II QL

Currently #ELEM(base/s=s) serializes to

{
  "attr": {
    "layer": "s",
    "match": "match:eq",
    "foundry": "base",
    "key": "s",
    "@type": "koral:term"
  },
  "@type": "koral:span"
}

I think, base/s=s needs to be the span definition.

Support Cosmas2's #REG()

The #reg() operator was newly introduced to Cosmas2 and is described in the documentation.

Serialize Poliqarps Alignment Operator

At the moment, alignment is only defined by a list of class numbers, which may not be enough. For example

baum ^ {1:test}?

Isn't serializable at the moment, as the first class may not be part of the result. This should be fixed by having a more robust serialization mechanism.

This issue is moved from Trac (old number is #214)

Make group operations parametric types

Currently group operations are passed as @id to the operation key of a koral:group type. That makes koral:group a huge spec. It would be better I guess to have operations defined as parametric objects. Consumers could simply be backcompatible by seeing a object is passed instead of a string. Unfortunately there is no way to keep Koral backcompat.

This would be a huge improvement for the spec descriptions as well.

Annis QL: Pointing relation serialization

Pointing relations are incorrectly interpreted to work in the following manner:
node & node & #2 ->antecedent[malt/d="PP"] #1
The serialization ignores the relation type (antecedent) and defines the annotation from the specified relation label.

A correct typed pointing relation query is for example
pos=/P.*/ & pos=/V.FIN/ & #2 ->dep[func="sbj"] #1
where dep is a relation type and [func="sbj"] is an additional label similar to that in the dominance operator.

To be more detailed, foundary and layer should also be defined in the type.
pos=/P.*/ & pos=/V.FIN/ & #2 ->malt/d="PP"[func="sbj"] #1

Suggestion for the serialization:

 "operation": "operation:relation",
    "relation": {
      "@type": "koral:relation",
      "wrap": {
        "@type": "koral:term",
        "foundry": "malt",
        "key": "PP",
        "layer": "d",
        "match": "match:eq"
      }
      "attr": {
        "@type": "koral:term",
	"key": "func:sbj",
	"match": "match:eq"
      } 
    }

Currently, attr object is not allowed in koral:relation and should probably has the same foundary as the relation.

Theoretically layer and foundry can also be specified for attr in KoralQuery. But it is kind of awkward in AQL since it is supposed to be a type-value pair and not a term with annotation.
pos=/P.*/ & pos=/V.FIN/ & #2 ->malt/d="PP"[malt/d=func="sbj"] #1

Support verbatim string values in Poliqarp

Currently strings can not be passed to Koral when they contain special characters.
So [orth=http://spiegel.de] does not work and [orth="http://spiegel.de"] only kind of works - as it translates the string to a regular expression. I would propose [orth='http://spiegel.de'] to be a way to pass verbatim string values. Currently this construct translates to regular expressions as well, but I don't think this is necessary.

Support i and x as valid layer names in Poliqarp

Due to the support of value flags like /i and /x, these characters (and their capital counterparts) are not valid layer names in Poliqarp, e.g. [mate/i=Baum] throws a parsing error, while [mate/b=Baum] is fine.

Unable to escape RegEx symbols

I created a branch escape-regex including a failing test regarding regexes in Poliqarp containing escaped symbols.
The PQ+-Query

"a\."

is interpreted as

which is wrong, I guess.

Annis: Relation Annotation with Regex

In Annis, it is possible to specify regex as the annotation value of a relation, e.g
node ->malt/d[func=/D.*/] node

Failed test in the CollectionQueryDuplicateTest in the master-branch

Failed tests: testCollectionQueryDuplicateThrowsAssertionException(de.ids_mannheim.korap.query.serialize.CollectionQueryDuplicateTest)

duplicate collection nodes

Whenever QuerySerializer.toJson() is called twice in a row, the private merge collection merges the original requestMap collection segment and the result from the collection query processor. The second call however leads to the nodes being already identical. They are then wrongly merged. Failing test is in dupColl branch.

Introduce a "text" type in "koral:doc"

Currently, koral:doc fields are type:string by default. Strings support the operations eq, ne, contains and containsnot. However - the meaning of contains and containsnot differs depending on the underlying field in the database: In text fields contains and containsnot is now treated as a phrase query with tokens. In strings we thought of treating it as a regular expression with dot-star-circumfix. The problem:
a) The frontend may know if a field is a string or a text field, but Koral doesn't, so in the KoralQuery serialization of the corpus query, this needs to be unspecified.
b) The backend needs to check, if a field is stored as a string or as a text prior to formulating the query, which is - at least - unelegant.
c) contains and containsnot work totally different depending on the field implementation in the backend, which is bad when the user is not aware of it (because there is no difference in the frontend for these field types)
d) contains and containsnot is redundant for strings, when regular expressions are supported.

I propose to introduce a text type for koral:doc that supports contains and containsnot. The string type will no longer support contains and containsnot. Unspecified koral:doc fields are treated as type:text by default. The description in the Koral documentation needs to be rephrased to talk about "subsequences" instead of "substrings".

This is a rather complicated design problem so I would like to hear your ideas.

Upgrade Cosmas II grammars to ANTLR 4

Cosmas II grammars are written in Antlr 3 language which is quite different from Antlr 4. The tree outputs would probably also be different.

Other QLs (Poliqarp, Annis, FCSQL) use Antlr 4. Thus, we use both Antlr 3 and 4 libraries, which fortunately do not conflicted with each other.

ANNIS QL: Serialization of dominance

Dominance is serialized as a relation with the layer c and without a key, that is not a valid koral:term object.

Suggestion:
node & node & #2 > #1
node & node & #2 ->dominance #1

could be serialized identically as a relation with key dominance.

 "operation": "operation:relation",
    "relation": {
      "@type": "koral:relation",
      "wrap": {
        "@type": "koral:term",
        "key": "dominance"
      }
    }

Improve error messages (to be similar to report types)

Here I would propose a new improved error format, that is more in line with KoralQuery report types.

{
  "msg" : [{
    "@type" : "koral:msg",
    "type" : "message:warning",
    "src" : "Krill",
    "value" : "No query given",
    "code" : 700,
    "param": []
  }]
}

The benefit of such a scheme would be:

Better distinction of message and parameters (e.g. position of failing expressions in the query, internationalization etc.)
Better tracing capabilities regarding the error source
Improved logging

Regular expressions in corpus queries

When using regular expressions in corpus queries, some characters are failing to parse. I think, this is similar to the behaviour in Poliqarp, where regex-parsing was fixed by reading the characters verbatim instead of parsing.

I added a failing test in 9de87c2 (branch fix-vc-regex).

Rename rewrite operations

Currently Koral uses both operation:insertion and operation:injection to describe the same kind of rewrite. Although it's not yet formally specified in the KoralQuery doc, I would use only one term. In addition to operation:injection, operation:modification should be used for modifications in the KoralQuery.

Update @context

The current context should be:

http://korap.ids-mannheim.de/ns/koral/0.3/

Operation: merge (C2 MAX)

Argument MAX in C2 groups together matches where multiple hits occur within the same span. For instance,
let Y contains X1, X2, X3, then X #IN(N,MAX) Y returns only 1 match.

Such grouping can be achieved with operation:merge with only one operand. This operation (in Krill) should check if the span still have the same start and end position and collects the payloads containing classes of the hits (Xs). Or is any of the classRefOp able to do this?

Serialization of wegen #IN(N,MAX) <s> would be

{
    "@context": "http://korap.ids-mannheim.de/ns/koral/0.3/context.jsonld",
    "query": {
        "operation": "operation:merge",
        "operands": [{
            "operation": "operation:position",
            "frames": [
                "frames:isWithin",
                "frames:matches"
            ],
            "operands": [
                {
                    "operation": "operation:class",
                    "@type": "koral:group",
                    "classOut": 1,
                    "operands": [{
                        "wrap": {
                            "@type": "koral:term",
                            "layer": "orth",
                            "match": "match:eq",
                            "key": "wegen"
                        },
                        "@type": "koral:token"
                    }]
                },
                {
                    "wrap": {
                        "@type": "koral:term",
                        "key": "s"
                    },
                    "@type": "koral:span"
                }
            ],
            "@type": "koral:group"
        }],
        "@type": "koral:group"
    }
}

Support quantifiers for morph() operators in Cosmas II

Cosmas II now supports Quantifiers for morph(). As this is a feature similar to Poliqarp+ it's rather trivial to support, I guess (already part of KoralQuery). But it has to be adopted by Koral.

When serializing Annis distance queries, the minimal distance can't be 0

There seems to be a bug in Koral, and it's illustrated with the query that has been sometimes used as an example: "so" & "nicht" & #1 .0,6 #2 -- the minimal distance can't be zero, so the query should read "so" & "nicht" & #1 .1,6 #2, with the interpretation that it has when you currently run it with a "0".

(It might be that the indirect precedence operator .* suggests that "0" would be OK here, but it's apparently meant to associate with the wildcard (rather than the Kleene star). If you say "confusing" and point at the Kleene star that Annis uses in regexes, I'll have to agree...)

Freshly verified in Annis

Poliqarp Plus query: empty token with quantifier.

query: []{3}
should generate a koral group
{@type:koral:group,
operation:operation:repetition,
operands:[{@type:koral:token}],
boundary:{@type:koral:boundary,min:3,max:3}
}
Currently, it only generates "query":{"@type":"koral:token"}

See de.ids_mannheim.korap.query.serialize.PoliqarpPlusQueryProcessorTest.testEmptyTokens()

Useless warnings on datelike strings

In collections, sometimes warnings are raised by the assumption that a value is a date. This is sometimes completely confusing (s. below) and sometimes wrong, as document identifiers may look like dates.
Failing example test:

    @Test
    public void testNotDate() throws JsonProcessingException, IOException {
        collection = "author=\"firefighter1974\"";
        qs.setQuery(query, ql);
        qs.setCollection(collection);
        res = mapper.readTree(qs.toJSON());
        assertEquals("koral:doc", res.at("/collection/@type").asText());
        assertEquals("author", res.at("/collection/key").asText());
        assertEquals("firefighter1974", res.at("/collection/value").asText());
        assertEquals("match:eq", res.at("/collection/match").asText());
        assertEquals(res.at("/errors/0/0").asText(), "");
        assertEquals(res.at("/warnings/0/0").asText(), "");
    }

Support operation:merge

The MIN and MAX attributes of C2-QL allow for grouping of matches that occur in the same context. After discussions and clarifications by Franck, we thought about different ways to serialize that. We aggreed, that we will need classes to point to the context relevant for the merge. So we may want to stick to operation:merge as a group operation wrapping the query and a classIn for the relevant context.
@margaretha mentioned the problem that sometimes the context may change (for distance operations with any order, if I understand that correctly). How can we deal with that?

Log foundry/layers used in a query as part of KoralQuery

When parsing a query, foundry and layer information is found and serialized to KoralQuery. To avoid reparsing to get these information, a separate array containing all foundry/layers requested in the query should be part of KoralQuery.

This information is useful for at least two features:

Easy check for the need of annotation rewrites by Kustvakt (if an annotation is listed, that is limited to users or if a layer is listed without an accompanied foundry, needing default foundry injection)
Recommending a limitation to documents having these annotations as part of the VC

This information may be listed in the meta section of KoralQuery.

Failed C2 distance query serialization

Failed
so /+w1 nicht
http://korap.ids-mannheim.de/kalamar?q=so+%2F%2Bw1+nicht&collection-name=&collection=&ql=cosmas2&cutoff=1

"so" /+w1 nicht
http://korap.ids-mannheim.de/kalamar?q=%22so%22+%2F%2Bw1+nicht&collection-name=&collection=&ql=cosmas2&cutoff=1

Works:
"so" /+w1 "nicht"
http://korap.ids-mannheim.de/instance/wiki?q=%22so%22+%2F%2Bw5+%22nicht%22&collection-name=&collection=&ql=cosmas2&cutoff=1

so /+w1 "nicht"
http://korap.ids-mannheim.de/kalamar?q=so+%2F%2Bw1+%22nicht%22&collection-name=&collection=&ql=cosmas2&cutoff=1

"match:containsnot" should probably be "match:without"

I don't think "match:containsnot" fits into the wording scheme of KoralQuery, that's why I propose "match:without" instead.

Special characters in collections are treated wrong

There seems to be a massive whitespace character handling problem in the collection builder, so whitespaces are removed even in collection constraint values. This leads to massive problems with, e.g. Author names or similar complex data types.
The failing test is in https://github.com/KorAP/Koral/tree/specialcharacterfix branch.