json-schema-org / json-schema-spec Goto Github PK
View Code? Open in Web Editor NEWThe JSON Schema specification
Home Page: http://json-schema.org/
License: Other
The JSON Schema specification
Home Page: http://json-schema.org/
License: Other
Most modern Web/hypermedia formats support IRIs instead of just URIs that are 7bit ASCII. IRIs are a superset of URIs that support full Unicode. For standards that only support URIs, IRIs have to be converted/escaped into a URI-compatible format.
Right now, links that use variables that aren't defined are just cast as empty. There should be some way to specify a default value, or to depend on certain variables existing and skip the hyperlink if it doesn't exist.
For example, if there's a schema like
{ links: [
{ href:"/doc/{uuid}", rel:"self" }
]
}
but multiple posts don't have a "uuid" property, then they all get the same URI of </post/
> and all of a sudden we're saying multiple different posts are actually the "same". Oops!
This can sort of be done right now like so:
{ anyOf: [
{
required: ['uuid'],
links: [ {href:'/doc/{uuid}', rel:'self'} ],
}
]
}
... but this is bulky.
This was originally proposed on the old wiki at https://github.com/json-schema/json-schema/wiki/multilingual-meta-data-(v5-proposal) by @geraintluff with further contributions from @brettz9 and @sonnyp
The translation
alternative discussed at the end of this comment was originally proposed at https://github.com/json-schema/json-schema/wiki/translations-(v5-proposal) by @geraintluff based on an email thread with @fge.
This proposal modifies the existing properties:
title
description
This proposal would also apply to the named enumerations proposed in issue #57 , if that makes it in.
This modification would allow inclusions of multiple translated values for the specified properties.
Currently, schemas can only specify meta-data in one language at a time. Different localisations may be requested by the client using the HTTP Accept-Language
header, but that requires multiple (largely redundant) requests to get multiple localisations, and is only available over HTTP (not when pre-loading schemas, for instance).
In addition to the current string values (which are presumed to be in the language of the document), the values of these keywords may be an object.
The keys of such an object should be IETF Language Tags, and the values must be strings.
When the value of the keyword is an object, the most appropriate language tag should be selected by the client, and the string value used as the value of the keyword.
{
"title": {
"en": "Example schema",
"de": "..."
}
}
Schemas with many languages could end up quite bulky.
In fact, the Accept-Language
option is in many ways more elegant, as the majority of the time only one language will be used by the client (and the other localisations will simply be noise). However, this option is not available in all situations. One might also avoid the extra bulk by using JSON references (and thereby also enable localisation files to contain all translatable text).
An alternative approach to the above would be to reserve localeKey
as a property for any schema object or sub-object and localization-strings
as a top-level property:
{
"localization-strings": {
"en": {
"example": {
"title": "Example schema",
"description": "Example schema description"
}
},
"de": {
"example": {}
}
},
"type": "object",
"localeKey": "example"
}
The advantage to this approach would be that, as typically occurs with locale files (for reasons of convenience in independent editing by different translators), all language strings could be stored together. Thus, if leveraging JSON references, it would be a simple matter of:
{
"localization-strings": {
"en": {
"$ref": "locale_en-US.json"
},
"de": {
"$ref": "locale_de.json"
}
},
"type": "object",
"localeKey": "example"
}
or yet simpler:
{
"localization-strings": {"$ref": "locales.json"},
"type": "object",
"localeKey": "example"
}
This alterantive proposes a translations
keyword which would be alongside the title
and `descript
The value of translations
would be an object. The values would be a JSON schema meta keyword and would themselves be objects,
where each property key MUST be in accordance with RFC 3066
When translating title and description, you can easily write an object where the meta keywords are RFC3066 conformal:
{
"title": "postal code",
"description": "A postal code.",
"translations": {
"title": {
"en-GB": "postcode",
"en-US": "zip code",
"de": "Postleitzahl",
"fr": "code postal"
},
"description": {
"en-GB": "A Royal Mail postcode.",
"en-US": "An USPS ZIP code.",
// ...
}
}
// ...
}
"What would be left to specify is of course what "relevant" is here.
Apart from "title", there is "description". But I don't think we want any other keyword to be affected."
It would be great if there were a logical if
schema in the vein of and
/or
/not
. It could look like
{ "if": [ conditional, consequent ] }
whose semantics would be exactly equivalent to:
{ "or": [ { "not": conditional },
{ "and": [ conditional, consequent ] } ] }
Thank you!
Originally proposed by @geraintluff at https://github.com/json-schema/json-schema/wiki/baseUri-(v5-proposal)
The content below is exactly as it appears on the old wiki:
baseUri
For convenience, specify a base URI against which schema-defined links will be resolved. This allows shorter href
values.
baseUri
must be a URI Template (resolved against current base URI, or request URI).
(v4 actually mentioned that rel="self" links could be used for this, but that's not ideal.)
{
"baseUri": "/items/{id}/",
"links": [
{
"rel": "comments",
"href": "comments/"
},
{
"rel": "related",
"href": "related/"
}
]
}
Does this propagate into children? Either:
baseUri
for every schema that defines linksbaseUri
applies to the data - at which point, what if multiple schemas have multiple values? Ideally, each schema would use its own baseUri
for its own links, but that gets complicated when it comes to child properties.Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/propertyLinks-(v5-proposal)
propertyLinks
Currently, we can describe the format of a parent object (with title/description/etc.), and we can describe the format of a child instance (with title/description/etc.) but we cannot describe the relationship between the two.
For instance, say you have an instance representing a programming project:
{
"title": "AwesomeNet",
"author": {
"name": "Petra the Programmer",
"homepage": "..."
}
}
When describing the format for this instance, you can easily write a schema for the parent format:
{
"title": "Programming project",
"description": "A description of a programming project",
"type": "object",
...
}
... and for the child format:
{
"title": "Person"
"description": "A representation of a person",
"type": "object",
...
}
These are completely accurate descriptions of the format, but there is no way to express anything about the relationship between the two - considered on its own, the entry in "author"
is indeed a "Person", but its relationship to the parent is more than that.
One current hack is to extend the "Person" format, so you can give it a title/whatever. However, that isn't really accurate - nothing has changed about the format at all, it's the parent-child link that's the interesting part.
The value of propertyLinks
would be an object. The values would themselves be objects, containing zero or more of the following properties:
title
- a name for the parent/child relationshipdescription
- a more detailed description of the parent/child relationshiprel
- a link relation (URI) representing the parent/child relationshipExtending the "Motivating example" above:
{
"title": "Programming project",
"description": "A description of a programming project",
"type": "object",
"properties": {
...,
"author": {"$ref": "/schemas/user"}
},
"propertyLinks": {
"author": {
"title": "Author",
"rel": "http://schema.org/author"
}
}
}
In that example, you end up listing "author"
twice in the schema.
However, the alternatives are either an intermediate object (hard-to-read and not concise) or sticking extra info in the child schema (which requires ugly/awkward allOf
extension, and still suffers from similar conceptual concerns to the existing workaround).
NOTE: This is a request for clarification in v5, and is not a proposal for changed behavior.
There are several underlying principles to validation which are currently poorly articulated, or even just implied. Some of the more contentious arguments over feature proposals are due to unclear understanding of these principles. Plainly stating these in the specification will help keep the evolution of JSON Schema focused and reduce feature debate noise.
You can index into JSON data by a property name or an array index. This can be written in JavaScript access form, e.g. A["foo"], A.foo, or A[0].
Indexing into a schema by a property name or array index number will, within this issue, mean finding the schema that would validate a similarly indexed instance. So if schema X validates instance A, then:
X.foo is the schema that is used to validate A.foo in the course of validating A with X.
X[5] is similarly the schema used to validate A[5]
Note that X.foo will in truth be one of:
X.properties.foo
X.patternProperties.patternThatMatchesFoo
X.additionalProperties # if neither of the above and additionalProperties is a schema
{} # the blank schema, if none of the above and additionalProperties is true
Similarly, X[5] will in truth be one of:
X.items[5] # if items is an array with at least six members
X.additionalItems # if items is an array with less than six members and addtionalItems is a schema
X.items # if items is a schema rather than an array
{} # if none of the above and additionalItems is true
"allOf"/"anyOf"/"oneOf"/"not" involve special considerations, which we will revisit within the principles below. Here are the basics of how indexing applies to them:
if X is an "allOf" with two branches X1 and X2, then:
X.foo is {"allOf": [X1.foo, X2.foo]}
if X is an "anyOf" or "oneOf" with two branches X1 and X2, then X.foo must only take into account the schema(s) that validated A. In the case of "anyOf" that may be both or just one, while in the case of "oneOf" it will always be just one of the branches.
If X2 is the branch of "oneOf" that validates A, then X.foo is X2.foo
If both X1 and X2 validate A in an "anyOf", then X.foo is {"anyOf": [X1.foo, X2.foo]}
if X is a "not" schema {"not": Y}, then there is no meaningful index into X. Depending on the rest of how Y is defined, Y.foo may or may not validate against A.foo, even though Y as a whole is guaranteed to fail validation with A due to the "not".
I am totally making these up off the top of my head. They are a starting point: some are missing, and some are probably wrong. Some are defined, and others are more of a request for someone to explain the principle involved.
Validation of a schema should succeed or fail independent of whether or where it appears within another schema.
A corollary of this is that if instance A validates against schema X, then indexing into both will produce a sub-instance that validates against the sub-schema. Since A.foo validates against X.foo in the context of A and X, it must also validate when pulled out to stand alone.
Notably, if X is {"not": Y}, the impact of this principle is unclear because there is no meaningful X.foo. The overall context of the "not" must be taken into account in order to say anything.
That this is an underlying principle is clear from reading the spec. However, I have not seen any explanation as to the benefit. Is it intended to facilitate extensibility somehow? Is it to avoid burdening validator implementors with expensive and difficult checks? If it is the latter, is having the validation succeed the only possible solution to this requirement?
One generalized example is section 4.1 of draft 04, which says: "Some validation keywords only apply to one or more primitive types. When the primitive type of the instance cannot be validated by a given keyword, validation for this keyword and instance SHOULD succeed."
Why should a schema of {"type": "string", "maximum": 10} which is clearly nonsensical validate cleanly against the string "foo"?
Furthermore, why should a default, or enum values, be allowed that fail validation?
It may ignore all annotation fields, all hypermedia fields, and all semantic validation fields (currently "format" is the only semantic field).
This is important for answering the objection that a new annotation field (for instance) places a burden on validator implementors. Since any minimal validator must already ignore any unrecognized fields in a schema, there is no validator burden for non-validation schema fields.
This principle can be inferred from what is marked required or optional and how each field behaves, but clearly articulating it will avoid some arguments based on observations of other issue discussions.
Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/constant-(v5-proposal)
constant
For ordinary use, this would be equivalent to a single-valued enum
, simply tidier.
The only real difference comes in its behaviour with $data
-substitution. $data
-subsitution would be allowed for this keyword, which means that this keyword is not capable of specifying literal values of the form: {"$data":...}
, because these would be interpreted as $data
-substitutions.
However, literal values of this form can still be specified using enum
, so there is no loss of functionality.
The value of this keyword would be any value - however, it would be subject to $data
-substitution.
Instances are only valid if they are exactly equal to the value of this keyword.
{
"type": "object",
"properties": {
"five": {
"constant": 5
}
}
}
Valid: {}
, {"five": 5}
Invalid: {"five": 0}
, {"five": "5"}
$data
to specify equality{
"type": "object",
"properties": {
"a": {"type": "string"},
"b": {
"constant": {"$data": "1/a"}
}
},
"required": ["a", "b"]
}
Valid: {"a": "foo", "b": "foo"}
, {"a": "bar", "b": "bar"}
Invalid: {"a": "foo", "b": "bar"}
enum
Unless $data
is being used, the same effect can be obtained using fewer actual characters:
{"constant":"whatever"}
- 23 characters{"enum":["whatever"]}
- 21 charactersHowever, when used in combination with $data
, it opens up possibilities that are not otherwise available.
JSON Schema presently specifies to use the "profile" link relation, e.g.:
Link: <http://example.com/Schema.json>;rel="profile"
However, the "profile" link relation is not supposed to be dereferenced. There's no way for an automated user agent to be able to follow this link and actually download the schema if it doesn't already have it.
Perhaps publish a link relation type, similar to profile
, that is supposed to be downloaded?
the current draft talks about a profile media type parameter, and recommends using it to link to a JSON schema. the existing https://tools.ietf.org/html/rfc6906 about profiles (that also talks about "profile" media type parameters) is not referenced, but at least there is some overlap. please consider that the idea of profiles (as specified in RFC 6906) is not to interlink instances and schemas. instead, the idea is that a profile identifies a set of rules applied to an instance which can be processed with or without knowing the profile-specific processing rules.
EXI for JSON https://www.w3.org/TR/exi-for-json/ is a method of compressing JSON into a compact binary form, using the algorithm defined by EXI (originally defined for XML).
For draft-5, we need not add define any specific compatibility features, but do consider the ways it might be used.
The draft 5 should (finally!) normalize the behavior of $ref
and id
, both in text and in form of language-agnostic test files.
Some of the edge-cases were discussed e.g. in
I can't do this so... @awwright ?
I mean, I can make the templates and make a PR if it would help time wise.
It may be useful to define, in somewhat mathematical terms, what it means to validate an instance, and which inputs are used.
I imagine the validation function being defined as such:
Validate[collection, schema, version, iriBase, instance] โ Boolean โช Indeterminate
Where:
This may also help to resolve issue #4. If the validation function is defined to have no side-effects, then we can just reiterate that point within the "default" keyword. We can also say the keyword is "not used for validation, but may be used for other purposes not defined here."
This is not to say that JSON Schema libraries can't implement other functions, they might desire to implement a "coerce" function that turns an arbitrary JSON instance into a validating one (casting strings to numbers, filling in missing required values using the default, etc).
Aside: Defining a "coerce" might be something useful for v6 (or, the next version with feature additions).
The JSON Hyper-schema spec currently lists the following
{
"title": "Post a comment",
"rel": "create",
"href": "/{id}/comments",
"method": "POST",
"schema": {
"type": "object",
"properties": {
"message": {
"type": "string"
}
},
"required": ["message"]
}
}
But this isn't a useful link relation because there's no such thing as a "create" link.
The place that would best define how to create resources would probably AtomPub, see https://tools.ietf.org/rfc/rfc5023.txt
The specification does later define "create", but it should normatively reference the IANA registry instead.
This proposal originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/linkSource-(v5-proposal)
This proposal would introduce the following keyword to LDOs:
linkSource
Currently, links described in links
apply to the instance being described by that schema.
Sometimes, however, it would be good to be able to describe links for other data items.
The value of linkSource
would be a relative JSON Pointer.
When parsing a link definition, the substitution (for href
and possible rel
) would be processed as normal.
Once the link had been determined, though, the Relative JSON Pointer in linkSource
would be resolved. The result of resolving that pointer should be considered the "source" of the link, instead of the current instance.
Take this data for example:
{
"postType": "blog",
"authors": [
"someuser123",
"otheruser"
],
...
}
The entries in "authors"
represent authors for the post - but the best we can currently do is to define a rel="author"
link on the string itself (e.g. "someuser123"
), or perhaps just define a rel="full"
link (not specify an author link at all).
This would be incorrect - the links shouldn't apply to the individual entries in "authors"
, but to the post itself. Using linkSource
, we could represent this as:
{
"type": "object",
"properties": {
"authors": {
"type": "array",
"items": {
"type": "string",
"links": [{
"rel": "author",
"href": {
"template": "/users/{username}",
"vars": {"username": "0"}
},
"linkSource": "2"
}]
}
}
}
}
If link definitions can be defined outside of the data they describe, then in order to find all the links that apply to the instance, it would no longer be enough to process the "immediate" schemas for that data - tools would have to inspect the schemas for all children in the entire instance.
'Nuff said. Edit: Well, just to be clear, the intent is for keys.
The json-schema.org homepage points to json-schema/json-schema, where it should point to json-schema-org/json-schema-spec.
Misbehaved clients might pose a problem if they pull a schema over the network every time it's being validated against, when it's instead possible to cache for a long period of time. Server owners won't like JSON Schema very much if this becomes a problem.
JSON Schema does not rely on or need HTTP, even if schemas are referenced with an http
or https
URI. However, in some hypermedia cases, it is still useful to download schemas over the network.
For these cases, add a section about behavior of clients when they make HTTP requests:
User-Agent: curl/7.43.0
use User-Agent: so-cool-json-schema/1.0.2 curl/7.43.0
. Since symbols are listed in decreasing order of significance, the JSON Schema library name/version goes first, then the more generic HTTP library name (if any)From
header so that server operators can contact the owner of a potentially misbehaving script.https://groups.google.com/forum/#!searchin/json-schema/format/json-schema/WInNIGWSL4U/UtTl29b-3GIJ
https://groups.google.com/forum/#!searchin/json-schema/format/json-schema/74XLt7R4ISE/FPOnAh6rq_UJ
I agree with the comment in the first post, it is too confusing; may I suggest for draft 5 or 6, JSON Schema goes back to one specification document with instance validation and format types inside it?
It has been since released as https://tools.ietf.org/html/rfc6901
In cases where oneOf
is used for multiple mutually exclusive options, it is frequently the case that the option to pick is a single key within the instance. E.g. the "type" property will be from the enum "Animal, Vegetable, Mineral" and the appropriate schema to apply is picked based on the value of this property.
Right now, all schemas must be tested (similar to an O(n) operation). Only one schema corresponding with an e.g. "type" property need be tested (similar to an O(log n) or O(1) operation), and only errors against that schema need be reported. Otherwise we get bizarre, non-helpful errors like so:
(1 of 1) instance.o3 is not exactly one from <http://example.org/Animal>,<http://example.org/Vegetable>,<http://example.org/Mineral>
"extends" was removed in draft 4. Can you allow anyOf, oneOf, allOf with one value only to allow for the same functionality (inheritance)? That way, the JSON schemas would remain compliant with the latest draft of the spec...
http://json-schema.org/latest/json-schema-validation.html#anchor145
I would like to see standardization on the following properties for any given schema (especially the root): autoIncrement
and keyPath
, as allowed for stores in IndexedDB. Perhaps these could be within a store-metadata
property or such.
I'd also like to see an indexes
property which could standardize on at least three subproperties (as used in IndexedDB indexes), unique
, multiEntry
, and keyPath
(and I'd hope the i18n work (I still intend to expand on this in PR #12 ) would hopefully also tie into Mozilla's locale
property so that such a property would not need to be added in this location as well). Perhaps a name
property could also be provided to allow for complete store and index generation.
Such properties could be used for auto-generating database stores based on the schemas, and one would not need to rebuild according to database implementation. IndexedDB is a particularly suitable choice, imo, since it is (at least seems it will be) ubiquitous in browsers, and can also be used as an API in server-side implementations (the most promising, imo, at least for Node, appears to me to be IndexedDBShim (with this PR), and possibly also for NoSQL implementations not having an IndexedDB API.
@ACubed
I am the current maintainer of JSON Schema Form (including the popular Angular Schema Form)
https://github.com/json-schema-form
That project has had a view definition for years, but recently I have begun working with other form building tool provider to create a json-ui-schema to define a universal view definition to marry up with a json-schema model definition to generate forms with more precision than you could, or perhaps should, with a json-schema definition alone. We have a pure JavaScript core and Angular implementation and potentially React, Node and other ports in the works soon.
I am concerned that some v5 proposals appear to be suggesting more view capability in json-schema (like choices) and I am interested in having discussion with the maintainers as I have not been able to get any email response from the json-schema github org members, @ACubed I would love to talk to you and any other json-schema maintainers about collaboration options. You can reach me at, iamanthropic, at g mail .
The contents of the wiki at json-schema/json-schema/wiki have not been migrated/redirected to their new home here.
Enumerations are often cryptic, particularly when they exist to match legacy systems that valued storage efficiency over readability. While it is possible to include more information with the title
and description
fields at the same level as the enum
, it is not possible to associate any additional information with each enum
value.
There are two use cases:
This falls squarely within JSON Schemaโs goals, and is simply about providing an easily-understood-by-humans string for each enum
value.
This is analogous to the value+label tuples common in web application framework geared towards producing select
widgets. While JSON Schema is intended to help build UIs, it is debatable as to whether this is enough of a core goal to motivate features on its own. See also issue #55
There have been several proposals to address this. The options so far are:
โenumโ
โenumโ
โenum
โ array with an array of tuples of (enumValue, humanName)Due โenumโ
values supporting any JSON type, it is not possible to have a JSON object mapping values to names. This is why lists of tuples are proposed instead.
@geraintluff proposed the parallel array of names, under the keyword โenumNamesโ
: https://github.com/json-schema/json-schema/wiki/enumNames-(v5-proposal)
@nemesisdesign proposed replacing with a tuple array, using the keyword โchoicesโ
, drawn from web app frameworks: https://github.com/json-schema/json-schema/wiki/choices-(v5-proposal-to-enhance-enum)
@sam-at-github proposed the parallel-ish array of tuples, under the keyword โenumLutโ
(although this is more or less the same as the proposed transitional period for moving the โchoicesโ). See the comments in the issue filed for "choices"
at the old repository (and also for a discussion of the validity of UI generation as a goal): json-schema/json-schema#211
โenumโ
with a new keyword that holds tuples is disruptive, and combines validation and annotation into one keyword, which weโve otherwise avoidedโenumโ
, matches how many web development frameworks set up <select>
inputs in forms.In terms of schema design purity, the parallel array of names is the best solution. โenumโ
remains a validation property, and โenumNamesโ
(or whatever we call the parallel array) is an annotation property.
In terms of ease of use, replacing the current value list with a tuple list is the best option. It removes any possibility of mis-matching values and names, and avoids any duplication. The cost is some syntactic noise for unnamed enums as the entries need to be tuples whether there are names or not.
In terms of flexibility, the parallel-ish array of tuples, which is keyed by the value rather than matched strictly by order, is the best option. It allows unnamed enums to continue to work exactly as they already do. We also preserve the validation vs annotation property separation. And it is not vulnerable to mismatches by miscounting. The cost is needing to duplicate the enum values, and then the values can get out of sync.
We should decide whether the separation of validation and annotation keywords is a fundamental part of the JSON Schema approach (again, see issue #55). If it is, then we can discard the "replace with a list of tuples" option, as it would be used for both validation and annotation. It would be the only annotation that leaves noise in the validation syntax even when it is not used. The value itself may be a tuple, so the top level must always be a tuple in order to avoid ambiguity, even if there is no name present.
If we do settle on the validation/annotation split principle, we're down to either adding a list of names that must be strictly parallel to the list of values, or we must add a list of tuples that are correlated by the value in the tuple. The former option is likely to get out of order or end up with the wrong number of entries, while the latter is likely to end up with values out of sync.
For simple values, keeping the values in sync should be pretty easy, but if enums supply complex data structure values, bugs are likely. I suspect that complex values in enums are quite rare.
For small sets of values, keeping lists in parallel should be easy, but long enums will lead to bugs. I suspect that long lists are more common than complex values.
If long lists are more common than complex values, we should choose the option that is more robust for long lists, which is the list of tuples. I'd appropriate the "enumName"
keyword for it, even though that was proposed for the list of names, because it clearly ties the list of tuples to the "enum"
property.
One mitigation for bugs involving values getting out of sync is that a debug mode could easily check that every value in the tuple list is an actual value of the corresponding enum. I am NOT proposing this as a step in validating instances- JSON Schema seems to generally be fine with nonsensical schemas (although that's another principle that we should confirm in issue #55). I am just speculating about an additional tool, like a linter for JSON Schema.
The point being that it would be possible to detect the most likely bugs from using a list of tuples with a theoretical linter, but the only thing such a linter could check with the list of names is that it is not longer than the enumeration. I think this, plus the likelihood of long enumerations vs complex values, gives the list of tuples alongside the existing "enum"
list the edge.
This would solve all the addressing woes currently plaguing JSON Schema.
If id
becomes informational only, then it may be completely ignored as a requirement for addressing, which has the very nice consequence that addressing is now completely unambiguous, since it only relies on JSON Reference and JSON Pointer.
Implementations wishing to rely on id
to define the current resolution context MAY do so; however, such implementations MUST NOT expect that peer implementations use this mechanism.
A net win for everybody.
(I said I wasn't involved in JSON Schema anymore, but this problem is still nagging me, and this simple change makes implementation of JSON Schema MUCH easier)
Another topic related to IndexedDB (as in issue #17) and perhaps leveraging the same syntax proposed in #15 ...
Although versioning of schemas may not be of large consequence whenever server-side databases are in use--since an upgrade can in many cases be forced at once on all new visitors, with client-side IndexedDB in particular (though also for server-side databases being interacted with by Ajax and maintaining separate stores for a given user) users may need to be allowed to continue operating with an old version of the database, but when the version change can occur, there needs to be a safe migration path (even potentially needing to go through multiple schema upgrades if the user is making changes to the database offline long before visiting the site online again).
IndexedDB has an upgradeneeded
event which can be leveraged for such migrations (and service workers could be used to grab the latest upgrades without the user needing to load a new page or refresh the old one), but it would be handy for the IndexedDB-friendly JSON Schema (proposed in issue #17 ) to also have a formal JSON definition for expressing diffs between schemas (even if it would not be able to have the robustness of all potential programmatic changes such as changing individual records between versions) and in a way which would also cause changes in the instance documents.
For example, one might wish to indicate that for version 2 of a schema, such-and-such a store should be added and an object modified, while for version 3, one store should be deleted, one schema object should be renamed, and one object should be moved elsewhere within the schema (and data also migrated--at least when "move" and "copy" operations are used on the schema diffs).
I personally feel that using the Google Group for help is fine, but people who are asking questions which have clear problem statements, I would much rather direct those to StackOverflow (with tag).
I strongly beleve that specific solved problems or questions should be easy to find, and would be much better suited on SO.
If agreed, I'll add a link under "more" on the website, but we will need to add a note about this on the Google Group, and agree that we should direct people there.
(Speaking of which, should we expand those who have admin rights on the Google Group? Myself and @awwright ?)
Sorry for the somewhat verbose title there, but this is a proposal to discuss addition of a feature similar to Joi's when.
Joi lets you do something like:
{
query: Joi.object({
type: Joi.string().required(),
value: Joi.string().max(255).when('type', {
is: 'optionalValue',
then: Joi.optional(),
otherwise: Joi.required()
})
});
}
This is a bit of a contrived (and simple) example, but I think it shows how easy the mental mapping is versus json-schema. json-schema supports the same functionality but very quickly becomes difficult to define, needing to mix and match dependencies
and in the worst case having to specify duplicate schemas with a single key's definition changed and jamming them into an anyOf
or oneOf
. What's worse is that when you have to use the aforementioned anyOf
approach you generally end up with resultant errors which are incredibly difficult to reason about, when all you really need is a single error that a given field doesn't match the criteria (not e.g. 3 errors: one indicating that the field was incorrect according to one schema, another indicating the error is incorrect in the other possible schema, and finally that neither of the schemas provided in anyOf
were matched).
Is there any room for discussion on adding these types of features to later versions of json-schema?
Just surfacing this proposal from https://github.com/json-schema/json-schema/wiki/contains-%28v5-proposal%29
It would greatly simplify JSON Schemas such as this one:
https://github.com/w3c/web-annotation-tests/blob/master/common/contextValue.json#L15-L26
It has been implemented in ajv.
This is more a question that needs answering as opposed to an issue. If anyone can shed some light on this, please do comment.
Also remove "hopefully temporary" from the tagline of this org. What makes you think it is temporary?
There's a lot of broken features defined in JSON Hyper-Schema, I want to ask implementors how much I can be allowed to "break" (i.e. make compliant with normative references).
Mostly things like quirks about how it defines URI templates, uses "rel", and uses "method".
Anyone?
@awwright It already happened once when the owner of json-schema became unreachable. In this organisation the situation is even worse - you are the only member. Could you maybe add these people as owners to this org?
The "format" keyword is currently defined as an optional feature of JSON Schema. This frees implementations from the relatively burdensome requirements of performing the specified semantic validations, but also intentionally makes the feature unreliable. As a result, schema authors frequently re-define validation schemas for fields that could be completely described with the "format" keyword were its implementation consistent.
This places an undue burden on schema writers who wish to both take advantage of any full implementations and work around any minimal implementations.
Here is an example of a document (written in YAML for human-friendliness) the provides JSON Schemas for ipv4 and ipv6 addresses for use in other schemas from the same product in place of the "format" keyword:
https://support.riverbed.com/apis/sh.common/1.0/service.yml
JSON Schema can provide a standard "pattern"-based schema for each format value in its meta-schema, which will provide a documented level of purely syntactical validation for instances. This requires only trivial additional work from implementations as shown below under "Mechanism".
Each such schema MUST successfully validate against all possible valid instances. They MAY also successfully validate invalid instances due to the limits of regular expressions or the decision of the JSON Schema standard that the full pattern is too complex or has too much of a performance impact to support at all.
A "formats" section would be added to the "definitions" within the meta-schema:
{
"definitions": {
"formats": {
"definitions": {
"ipv4": {
"minLength": 7,
"maxLength": 15,
"pattern": "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$"
},
"email": {
"pattern": "you-get-the-idea"
}
}
}
}
}
The purpose of the nested "definitions" section is to clearly differentiate between definitions used only for format validation and definitions used to build the actual meta-schema.
If an implementation does not handle "format": "ipv4" directly, then the schema:
{
"$schema": "http://json-schema.org/schema#",
"type": "string",
"readOnly": true,
"format": "ipv4"
}
should be interpreted as:
{
"$schema": "http://json-schema.org/schema#",
"allOf": [
{
"type": "string",
"readOnly": true
},
{ "$ref": "http://json-schema.org/schema#/definitions/formats/definitions/ipv4" }
]
}
combining the fallback schema with whatever schema elements beyond "format" were already present.
While all of the formats can be at least somewhat validated by regular expressions, several are either extremely complex to fully validate or cannot be entirely validated by a regex. Is this a problem? I argue that it is not, because properly implemented this provides substantial validation assistance that schema authors are otherwise writing each time themselves. Schema authors may examine the supplied regexes and determine whether or not they are sufficient for the given application, and re-implement them accordingly if they are not. This is no worse than what currently happens.
Due to the complexity of the regular expressions involved, the performance impact of using them is a valid concern. However, the "format" specification already states that implementations SHOULD provide an option to disable the keyword. That requirement should be left as-is. Disabling the "format" keyword should disable it entirely, including the fallback validation.
Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/%24data-(v5-proposal)
NOTE: JSON Relative Pointer is defined as an extension of JSON Pointer, which means that an absolute JSON pointer is legal anywhere that a relative pointer is mentioned (but not vice versa).
Absolute JSON Pointers always begin with /
, while relative JSON pointers always begin with a digit. Resolving a pointer beginning with /
behaves the same whether it is being resolved "relative" to a specific location or not, just as resolving a URI "/foo/bar" is resolved the same whether there is an existing path component to the URI or not.
$data
This keyword would be available:
{"$data": ...}
) for the following schema properties:
minimum
/maximum
exclusiveMinimum
/exclusiveMaximum
minItems
/maxItems
,enum
{"$data": ...}
) for the following LDO properties:
href
rel
title
mediaType
This keyword would allow schemas to use values from the data, specified using Relative JSON Pointers.
This allows more complex behaviour, including interaction between different parts of the data.
When used inside LDOs, this allows extraction of many more link attributes/parameters from the data.
Wherever it is used, the value of $data
is a Relative JSON Pointer.
If the $data
keyword is defined in a schema, then before any further processing of the schema:
$data
is interpreted as a Relative JSON Pointer.When used in one of the permitted schema/LDO properties, then before any further processing of the schema/LDO:
$data
is interpreted as Relative JSON Pointer.{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"smaller": {"type": "number"},
"larger": {
"type": "number",
"minimum": {"$data": "1/smaller"},
"exclusiveMinimum": true
}
},
"required": ["larger", "smaller"]
}
In the above example, the "larger"
property must be strictly greater than the "smaller"
property.
Currently, validation is "context-free", meaning that one part of the data has minimal effect on the validation of another part. This has an effect on things like referencing sub-schemas. Changing this is a big issue, and should not be done lightly.
Some interplay of different parts of the data can currently be specified using oneOf
(and the proposed switch
) - but crucially, these constraints are specified in the schema for a common parent node, meaning that sub-schema referencing is still simple.
The use of $data
also (in some cases) limits the amount of static analysis that can be done on schemas, because their behaviour becomes much more data-dependent. However, the expressive power it opens up is quite substantial.
It's also tempting to allow its use for all schema keywords - however, not only is that a bad idea for keywords such as properties
/id
, but it also might present an obstacle to anybody extending the standard.
enum
valuesIt should be noted that while {"enum": {"$data":...}}
would extract a list of possible values from the data, {"enum": [{"$data":...}]}
would not - it would in fact specify that there is only one valid value: {"$data":...}
.
Similar concerns would exist with an extra keyword like constant
- what if you want the constant value to be a literal {"$data":...}
? However, perhaps constant
could be given this data-templating ability, and if you want a literal {"$data":...}
, then you can still use enum
.
The existing mechanics of $ref
can be nicely described using a rel="full"
link relation.
The mechanics of $data
, however, would be impossible to even approach in the meta-schema. We could describe the syntax, but nothing more. Is this a problem?
Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/Extended-templating-syntax-(v5-proposal)
Uses existing keyword - proposes extension of href
in LDOs.
(Should this syntax also be allowed inside "$ref"
to allow templating of references?)
Currently, the only values available for templating in href
are the object itself, and the immediate children of the object (which must be referred to by their exact name).
This proposed new syntax would allow more powerful templating, specifying values from the data using Relative JSON Pointer.
It would also allow the re-naming of template variables. This is useful because in some templates, the variable name is actually included in the results, e.g.
/prefix/{?foo,bar,baz}
-> /prefix/?foo=1&bar=2&baz=3
In addition to the existing string values/behaviour for href
, the following is proposed:
The value of href
may be an object, containing the following properties:
template
- containing a URI templatevars
- an object, where all the values are Relative JSON PointersTo obtain the URI for the link, the URI template in template
is expanded. When a variable is referenced by the template, its value is obtained like so:
vars
, then:
vars
is interpreted as a Relative JSON Pointer, and resolved relative to the current data instance.(Note the complete lack of pre-processing rules - they are not needed here, due to the expressive power of Relative JSON Pointers.)
Data:
{
"author": {"id": 241, "name": "Jenny"},
...
}
Schema:
{
"links": [
{
"rel": "author",
"href": {
"template": "/users/{authorId}",
"vars": {
"authorId": "0/author/id"
}
}
}
]
}
This syntax is in many ways much simpler than the existing syntax, because there is no need for escaping rules. (The current syntax does pre-processing using (
, )
and $
.)
We are faced with a choice, then - to make the new syntax equally complex, or to have complex pre-processing rules for URI Templates in some situations but not others. (Or, of course, remove the old plain-string syntax, which will impact brevity as well as backwards-compatability.)
$ref
Allowing templating inside $ref
would force all validators to implement link-parsing - currently, validators can ignore all hyper-schema aspects, which is convenient.
Use inside $ref
would limit static analysis for schemas. However, (like $data
) allowing this keyword in $ref
would open up quite a lot of expressive power.
Use inside $ref
would also ruin our ability to describe $ref
relationships in the meta-schema. Currently, the $ref
behaviour is characterised by a full
link, but allowing templating would undermine that.
rel="describedby"
The behaviour of templating $ref
can currently be mirrored by adding a rel="describedby"
link:
{
"links": [{
"rel": "describedby",
"href": "/schemas/{type}"
}]
}
The only difference is that validators are not obliged to take any notice of links. "Hyper-validators" should, but it is not expected that plain validators would.
Right now, the proposed $data reference only supports relative JSON pointers. While this is useful and more compact for referencing data nearby, it's a big pain in the ass for referencing data that is further away. As an example (note that I'm using YAML for readability):
type: object
properties:
foo:
enum:
$data: "1/enumList"
bar:
type: object
properties:
baz:
enum:
$data: "2/enumList"
enumList:
type: array
At a small scale, the only real issue is the lack of code reuse, as you can't reuse the enum if the data is closer or further from the root. However, the problem gets progressively worse the larger and more complex the schema gets. It's not, however, an insurmountable problem. The real problem is when you start using recursive schemas:
type: object
properties:
foo:
$ref: "#/definitions/recursiveSchema"
enumList:
type: array
definitions:
recursiveSchema:
type: object
properties:
bar:
enum:
$data: "2/enumList"
foobar:
$ref: "#/definitions/recursiveSchema"
Example Data:
foo:
bar: "Allowed"
foobar:
bar: "Not Allowed"
foobar:
bar: "Allowed"
enumList: ["Allowed"]
Now, what I expected to happen here was the relative pointer only works on the first level, and then simply fails to resolve the enum for every level after. What actually happens is AJV realizes the relative pointer is completely out of whack and assumes something is wrong with the JSON, and rejects any data you give it that follows the schema. Remove either the recursion or the enum, and whatever is left works just fine.
With regular JSON pointers, the fix is easy: instead of "[number]/enumList", it would simply be "/enumList", and it would resolve properly regardless of where you are in the document. All absolute pointers start with a slash, and all relative pointers start with a number, so there would never be any confusion about which is which.
Adding in JSON pointers shouldn't be hard for the people who've already implemented the $data reference, but it would be nice if it was part of the actual specification.
It seems that the some of problems with $refs
and id
could be alleviated by disallowing fragments in id
URIs.
Currently such fragments
id
) and inner schemas (which can have fragments in id
)Originally written by @epoberezkin at https://github.com/json-schema/json-schema/wiki/Custom-error-messages-(v5-proposal)
with additional requests by @the-t-in-rtf at json-schema/json-schema#222
Add keyword errors
that would contain error messages, potentially templated, that would be added to errors reported by validators when some keyword fails the validation. Example:
{
"properties": {
"age": {
"minimum": 13,
"errors": {
"minimum": "Should be at least ${schema} years, ${data} years is too young."
}
},
"gender": {
"enum": ["male", "female"],
"errors": {
"enum": {
"text": "Gender should be ${schema/0} or ${schema/1}",
"action": "replace"
}
}
}
}
}
They can be merged using absolute or relative JSON pointers:
{
"properties": {
"age": { "minimum": 13 },
"gender": { "enum": ["male", "female"] }
},
"errors": {
"#/properties/age/minimum": "Should be at least ${schema} years, ${data} years is too young.",
"#/properties/gender/enum": {
"text": "Gender should be ${schema/0} or ${schema/1}",
"action": "replace"
}
}
}
The HTTP Link header, HTML, and Atom each define slightly different attributes on link relations. Things like hints at the target resource's media type, language, title, and other metadata that would otherwise require dereferencing the resource.
Perhaps JSON Schema should normatively reference these link-extension
s or similar.
CBOR (RFC7049, http://cbor.io/) is considered a binary version of JSON, however it implements a superset of functionality, including native dates, byte (octet) strings (JSON is UTF), integers, URIs, and different storage formats for floating point and fixed and variable sized integers.
For draft-5, we need not add features specific to CBOR, but consider the ways that CBOR might be used, and make sure there's no definitions in outright opposition to this goal.
Originally written by @geraintluff at https://github.com/json-schema/json-schema/wiki/contains-(v5-proposal)
contains
We also might want an equivalent for objects (like containsProperty
).
Specifying that an array must contain at least one matching item is awkward. It can currently be done, but only using some inside-out syntax:
{
"type": "array",
"not": {
"items": {
"not": {... whatever ...}
}
}
}
This would replace it with the much neater:
{
"type": "array",
"contains": {... whatever ...}
}
It would also enable us to specify multiple schemas that must be matched by distinct items (which is currently not supported).
The value of contains
would be either a schema, or an array of schemas.
If the value of contains
is a schema, then validation would only succeed if at least one of the items in the array matches the provided sub-schema.
If the value of contains
is an array, then validation would only succeed if it is possible to map each sub-schema in contains
to a distinct array item matching that sub-schema. Two sub-schemas in contains
cannot be mapped to the same array index.
{
"type": "array",
"contains": {
"type": "string"
}
}
Valid: ["foo"]
, [5, null, "foo"]
Invalid: []
, [5, null]
{
"type": "array",
"items": {"type": "object"},
"contains": [
{"required": ["propA"]},
{"required": ["propB"]}
]
}
Valid:
[{"propA": true}, {"propB": true}]
[{"propA": true}, {"propA": true, "propB": true}]
Invalid:
[]
[{"propA": true}]
- no match for second entry[{"propA": true, "propB": true}]
- entries in contains
must describe different itemsThe plain-schema case is simple.
The array case is equivalent to Hall's Marriage Theorem. There are relatively efficient solutions for the general problem - but, I suspect a brute-force search will be surprisingly effective and efficient (due to the relatively small number of entries in contains
).
It may or may not be worth warning schema authors about stuffing hundreds of entries into contains
, because a naive implementation could easily end up having O(n3m) complexity.
Behaviour for the array for may be slightly complicated. For example:
{
"type": "array",
"contains": [
{"enum": ["A", "B"]},
{"enum": ["A", "B", "C"]},
{"enum": ["A", "D"]},
]
}
In this case, ["A", "B", "C"]
is valid.
However, this is not due to the syntax - it's simply a complex constraint.
And again, those are the two only reliable mechanisms allowing for truly extending schemas in an unambiguous fashion (yes, I love unambiguous definitions). Recall:
$merge
relies on JSON Merge Patch, aka RFC 7396;$patch
relies on JSON Patch, aka RFC 6902.Recall:
$merge
;$patch
.Why, then, define $patch
? Simply because it allows for schema alterations which $merge
cannot do. However, $merge
answers the vast majority of cases.
Recall of the rules:
$ref
still takes precedence.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.