washingtonpost / ans-schema Goto Github PK

View Code? Open in Web Editor NEW

105.0 34.0 43.0 2.76 MB

JSON schema definition and supporting example/validation code for The Washington Post's ANS specification

License: MIT License

JavaScript 100.00%

washington-post json-schema schema-files arc arc-publishing arcxp

ans-schema's People

Contributors

Stargazers

Watchers

ans-schema's Issues

Proposal: Scheduled Publish Information

Problem

Information about future publish events is not included on story objects currently.

Proposal

Include information about future and past publish/unpublish events. (E.g., Story API & Content Ops API data) This data would appear on unpublished versions of stories ONLY.

Example (Updated)

{
    "publishing": {
        "has_published_edition": true,
        "editions": {
            "default": {
                "edition": "default",
                "edition_date": "2017-",
                "first_published_date": "2017-03-",
                "published": false
            },
            "print": {
                "edition": "print"
                "publish_date": "2017-03-03",
                "revision_id": "",
                "published": true
            }
        },        ,
        "operations_scheduled": {
            "publish_edition": [{
                "type": "story_operation",
                "operation": "publish_edition",
                "date": "2017-04-15T15:00:00Z",
                "publish_date": "2017-04-01T",
                "display_date": "2017-04-02T",
                "revision_id": "CAIP33ODLZD5LESQ3CH6MLP6IM",
                "editions": [ "default" ]
            }],
            "unpublish_edition": [{
                "type": "story_operation",
                "operation": "unpublish_edition",
                "date": "2017-04-15T15:00:00Z",
                "editions": [ "default" ]
            }]
        }
    }
}

Add Oembed response data

Problem

Add an oembed representation to ANS video/image documents.

Proposal

Should include all general fields, video-specific and photo-specific fields from
http://oembed.com/#section2

type
version
title
author_name
author_url
provider_name
provider_url
cache_age
thumbnail_url
thumbnail_width
thumbnail_height
html
width
height
url

Example

{
  "type": "video",
  "version": "0.5.8",
  "headlines": {
    "basic": "My Video"
  },
  
  "oembed": {
    "type": "video",
    "version": "1.0",
    "title": "My Video",
    "author_name": "The Washington Post",
    "author_url": "https://www.washingtonpost.com",
    "provider_name": "The Washington Post",
    "provider_url": "https://video.washingtonpost.com/oembed",
    "cache_age": 86400,
    "thumbnail_url": "https://video.washingtonpost.com/thumbnail/my-video.jpg",
    "thumbnail_width": 640,
    "thumbnail_height": 480,
    "html": "<iframe width=\"560\" height=\"315\" src=\"https://video.washingtonpost.com/video/my-video\" frameborder=\"0\" allowfullscreen></iframe>",
    "width": 1920,
    "height": 1080
  }
}

`content_element` could be an `anyOf` definition

content_elements is defined in trait_content_element.json as an array of content_element defined in utils/content_element.json. But, what specificiation doesn't explain is that could also be replaced by any schema defined in story_elements

Should not be the definition of content_element the reference to any of that schemas?

Current behavior:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/utils/content_element.json",
  "title": "An element that can be listed as part of content elements",
  "description": "An item that conforms to this schema can be rendered in a sequence",
  "type": "object",
  "additionalProperties": {},
  "properties": {
    "type": {
      "type": "string"
    },
    "_id": {
      "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_id.json"
    },
    "subtype": {
      "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_subtype.json"
    },
    "channels": {
      "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_channel.json"
    },
    "alignment": {
      "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_alignment.json"
    },
    "additional_properties": {
      "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_additional_properties.json"
    },
    "gallery_properties": {
      "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_gallery_properties.json"
    }
  },
  "required": [ "type" ]
}

Expected behavior:

{
  "anyOf": [
    {
      "$schema": "http://json-schema.org/draft-04/schema#",
      "id": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/utils/content_element.json",
      "title": "An element that can be listed as part of content elements",
      "description": "An item that conforms to this schema can be rendered in a sequence",
      "type": "object",
      "additionalProperties": {},
      "properties": {
        "type": {
          "type": "string"
        },
        "_id": {
          "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_id.json"
        },
        "subtype": {
          "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_subtype.json"
        },
        "channels": {
          "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_channel.json"
        },
        "alignment": {
          "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_alignment.json"
        },
        "additional_properties": {
          "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_additional_properties.json"
        },
        "gallery_properties": {
          "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.10.5/traits/trait_gallery_properties.json"
        }
      },
      "required": [ "type" ]
    },
    {
      "$ref": "https://github.com/washingtonpost/ans-schema/blob/master/src/main/resources/schema/ans/0.10.5/story_elements/blockquote.json"
    },
    {
      "$ref": "https://github.com/washingtonpost/ans-schema/blob/master/src/main/resources/schema/ans/0.10.5/story_elements/code.json"
    }
    ...
  ]
}

Use case:
I'm using a mapping tool from json schema to classes and I have to import all that schemas apart because they are not referenced.

Vulnerabilities in master branch

                                                                                
                       === npm audit security report ===                        
                                                                                
# Run  npm install [email protected]  to resolve 3 vulnerabilities
SEMVER WARNING: Recommended action is a potentially breaking change
┌───────────────┬──────────────────────────────────────────────────────────────┐
│ High          │ Regular Expression Denial of Service                         │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Package       │ minimatch                                                    │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Dependency of │ mocha                                                        │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Path          │ mocha > glob > minimatch                                     │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ More info     │ https://npmjs.com/advisories/118                             │
└───────────────┴──────────────────────────────────────────────────────────────┘


┌───────────────┬──────────────────────────────────────────────────────────────┐
│ Critical      │ Command Injection                                            │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Package       │ growl                                                        │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Dependency of │ mocha                                                        │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Path          │ mocha > growl                                                │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ More info     │ https://npmjs.com/advisories/146                             │
└───────────────┴──────────────────────────────────────────────────────────────┘


┌───────────────┬──────────────────────────────────────────────────────────────┐
│ Low           │ Regular Expression Denial of Service                         │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Package       │ debug                                                        │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Dependency of │ mocha                                                        │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Path          │ mocha > debug                                                │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ More info     │ https://npmjs.com/advisories/534                             │
└───────────────┴──────────────────────────────────────────────────────────────┘


found 3 vulnerabilities (1 low, 1 high, 1 critical) in 65 scanned packages
  3 vulnerabilities require semver-major dependency updates.

Proposal: Nested blockquotes & credit

Problem

Blockquotes are frequently used to excerpt large amounts of content from other sources. Currently the blockquote element includes only one exclusive field, content which expects an HTML string. This makes quotes longer than a single paragraph difficult (either embed the parapgraph breaks within the content as HTML, or have a series of separate blockquote elements), it makes mixed-content like images, video difficult (the only option is to render it all to HTML), and it makes nested blockquotes -- an actual use case -- difficult for the same reason.

Furthermore, when rendered, quotes often have a credit or source associated with them. Currently this is not represented in ANS and requires a separate text element, not clearly semantically linked.

Proposed Solution

Add a quote element to ANS. A quote will differ from a blockquote in two key ways

content_elements attribute -- Instead of a flat content string, this will include a list of sub-elements (similar to a element_group or sub-story)
citation -- (Optional) A text element for describing the source of the quote

For convenience, blockquote will continue to exist for the ANS 0.5.8 in its current form and be deprecated+upverted in a later release

Example

{
  "type": "quote",
  "content_elements": [
    { 
       "type": "text",
       "content": "Now is the time for all good men to come to the aid of their party."
     },
     {
       "type": "image",
       "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Underwoodfive.jpg/220px-Underwoodfive.jpg"
     }
  ],
  "citation": {
    "type": "text",
    "content": "Charles E. Weller; <em>The Early History of the Typewriter</em>, p. 21 (1918)"
   }
}

Potential concerns

Should any content elements be forbidden from being included in a quote's content elements?

Proposal: Add a Legal Trait

Problem

For legal permissions, blah, blah

Example

{
  "type": "story",
  "version":"0.5.8",
  "legal": {
    "viewable": true
  }
}

Add support for "Attached Redirects"

Problem

There are cases where users want to store content in the Content API (or related systems) but want that content, when directly requested by url, to return another piece of content instead. Information about where the content will redirect if navigated to directly is also useful.

Proposal

Allow related_content.redirect, if present, to be a list of redirect objects instead of content or reference objects. Content API will use redirect information from that redirect object in the fetch by canonical url endpoint.

Example

{
  "_id": "ID_A",
  "type": "video",
  "version": "0.5.8",
  "canonical_url": "/path/to/some/video",
  "headlines": {
    "basic": "This video should only be shown embedded within an article."
  },
  "streams": [
    {
      "height": 360,
      "width": 640,
      "filesize": 5369656,
      "stream_type": "ts",
      "url": "https://videos.posttv.com/washpost-production/The_Washington_Post/20170223/58af2f07e4b0c9994378e6ff/58af3662e4b09a0a172ed097_t_1487877764511_mobile.m3u8",
      "bitrate": 300,
      "provider": "elastictranscoder"
    }
  ],
  "related_content": {
    "redirect": [
      {
        "type": "redirect",
        "version": "0.5.8",
        "canonical_url": "/path/to/some/video",
        "redirect_url": "/path/to/some/article"
      }
    ]
  }
}

Proposal: Ads config

Problem

Need a place to store ad configuration.

Proposal

ads_config

ad_unit_id: What WaPo calls commercial_node. An identifier with a given page to display ads from a particular source (rather than network, house ads)
show_ads: A boolean determining whether ads are hidden on the content object

Example

{
  "type": "story",
  "version": "0.5.8",
  "ads": {
    "ad_unit_id": "cool_client",
    "show_ads": true
  }
}

Add corrections as a top level attribute in the video json

Add corrections json (see json below) as a top level attribute in video ANS json as it is in the story ANS json. It should be at the same level in the video json as _id, version, subtype, channels, etc..

JSON to be added:

corrections: {
  "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/traits/trait_corrections.json"
}

We don't anticipate any issues with this change as it is a new element in the video ANS.

Add dates to source

Problem

No way to represent dates that represent times from the source CMS.

Proposal

Add the following to the source object:

last_published
first_published
last_updated
created
publication_start
publication_end

These fields are intended for informational purposes only and will not be affected or populated by any Arc Content API systems. They do not affect control flow or scheduled publishing

Example

{
  "type": "story",
  "version": "0.5.8",
  "source": {
    "source_type": "Staff",
    "name": "Methode",
    "last_published": "2017-01-01T00:00:01Z",
    "first_published": "2017-01-01T00:00:01Z",
    "last_updated": "2017-01-01T00:00:01Z",
    "created": "2017-01-01T00:00:01Z",
    "publication_start": "2017-01-01T00:00:01Z",
    "publication_end": "2017-01-01T00:00:01Z",
  }
}

ans validator errors-out

using master branch to run
$npm install
results in warnings:

npm WARN Invalid version: "1.10.7.1"
npm WARN ans-schema No description
npm WARN ans-schema No repository field.
npm WARN ans-schema No README data
npm WARN ans-schema No license field.

attempting validation with:
$npm run-script ans -- --ansfile=src/a-story.json --version=0.10.8 validate
of a-story.json story file results in:

npm ERR! Invalid version: "1.10.7.1"

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/arcxp/.npm/_logs/2021-05-19T01_02_22_923Z-debug.log

The log file contains:

0 info it worked if it ends with ok
1 verbose cli [ '/home/arcxp/.nvm/versions/node/v10.17.0/bin/node',
1 verbose cli   '/home/arcxp/.nvm/versions/node/v10.17.0/bin/npm',
1 verbose cli   'run-script',
1 verbose cli   'ans',
1 verbose cli   '--',
1 verbose cli   '--ansfile=src/a-story.json',
1 verbose cli   '--version=0.10.8',
1 verbose cli   'validate' ]
2 info using [email protected]
3 info using [email protected]
4 verbose stack Error: Invalid version: "1.10.7.1"
4 verbose stack     at Object.fixVersionField (/home/arcxp/.nvm/versions/node/v10.17.0/lib/node_modules/npm/node_modules/normalize-package-data/lib/fixer.js:191:13)
4 verbose stack     at /home/arcxp/.nvm/versions/node/v10.17.0/lib/node_modules/npm/node_modules/normalize-package-data/lib/normalize.js:32:38
4 verbose stack     at Array.forEach (<anonymous>)
4 verbose stack     at normalize (/home/arcxp/.nvm/versions/node/v10.17.0/lib/node_modules/npm/node_modules/normalize-package-data/lib/normalize.js:31:15)
4 verbose stack     at final (/home/arcxp/.nvm/versions/node/v10.17.0/lib/node_modules/npm/node_modules/read-package-json/read-json.js:428:5)
4 verbose stack     at then (/home/arcxp/.nvm/versions/node/v10.17.0/lib/node_modules/npm/node_modules/read-package-json/read-json.js:161:5)
4 verbose stack     at /home/arcxp/.nvm/versions/node/v10.17.0/lib/node_modules/npm/node_modules/read-package-json/read-json.js:382:12
4 verbose stack     at /home/arcxp/.nvm/versions/node/v10.17.0/lib/node_modules/npm/node_modules/graceful-fs/graceful-fs.js:115:16
4 verbose stack     at FSReqWrap.readFileAfterClose [as oncomplete] (internal/fs/read_file_context.js:53:3)
5 verbose cwd /home/arcxp/LVI/github/arcxp/arc-xp/ans-schema
6 verbose Linux 5.4.0-73-generic
7 verbose argv "/home/arcxp/.nvm/versions/node/v10.17.0/bin/node" "/home/arcxp/.nvm/versions/node/v10.17.0/bin/npm" "run-script" "ans" "--" "--ansfile=src/a-story.json" "--version=0.10.8" "validate"
8 verbose node v10.17.0
9 verbose npm  v6.11.3
10 error Invalid version: "1.10.7.1"
11 verbose exit [ 1, true ]

I tried using node 12 and node 10 and got the same errors in both cases.

I imagine that it is something others have run into already, any help in getting the local ans validator working would be greatly appreciated, I just need to validate a few story documents locally.

Thank you

Content Element Groups

Problem

ANS has no way to represent image alignment, side by side images, or other relationships between individual content elements.

Proposed Solution

A general group element that indicates a relationship between two or more sub-elements.

Example

{
  "type": "story",
  "version": "0.5.8",
  "content_elements": [
    {
      "_id": "1",
      "type": "element_group",
      "version": "0.5.8",
      "content_elements": [
        {
          "_id": "1a",
          "type": "image",
          "url": "http://ichef-1.bbci.co.uk/news/660/cpsprodpb/1325A/production/_88762487_junk_food.jpg"
        },
        {
          "_id": "1b",
          "type": "text",
          "content": "Inventore corrupti reprehenderit sunt eaque fugit repellendus. Saepe architecto totam porro. Voluptatem deserunt sequi corporis. Autem ducimus eligendi ut."
        }
      ]
    }
  ]
}

Pros

Seemingly accommodates at least two immediate use cases

Cons

Pretty bare-bones

Referent Merges

Problem

Need a way to merge data from one content item to a specific field in another content item. This can be generally thought of as a particular post-processing action to be taken when ingesting new content.

Proposal

Add post_processors as an array to all top level content types. Add referent_merge type as a valid post processor. Each post processor will be applied in sequence after denormalization but before persisting.

Example

{
  "_id": "ABC",
  "type": "story",
  "version": "0.5.8",
  "additional_properties": {
  },
  "post_processors": [
    {
       "type": "merge_reference",
       "referent": {
          "type": "story",
          "id": "DEF",
          "provider": ""
       },
       "source": ".content_elements[0].content",
       "destination": ".additional_properties.foo"
      }
  ]
}

{
  "_id": "DEF",
  "type": "story",
  "version": "0.5.8",
  "content_elements": [
    {
       "type": "text",
       "content": "Now is the time for all good men to come to the aid of their party."
    }
  ]
}

would produce, when ingesting ABC:

{
  "_id": "ABC",
  "type": "story",
  "version": "0.5.8",
  "additional_properties": {
     "foo": "Now is the time for all good men to come to the aid of their party."
  },
  "post_processors": [
    {
       "type": "merge_reference",
       "referent": {
          "type": "story",
          "id": "DEF",
          "provider": ""
       },
       "source": ".content_elements[0].content",
       "destination": ".additional_properties.foo"
      }
  ]
}

Inconsistent naming convention, or possibly a typo?

I notice that in story.json the language property ref points to trait_locale.json.

I propose either

changing the name the reference to trait_language, or
change the story property to locale.

This is to make the link between the two more clear, and to make it consistent with the pattern of the other references.

Proposal: Comments Config

Problem

Comments can be configured on a per-page basis, and some comment metadata is useful when rendering.

Proposal

A comments object with the following properties (suggested by @StephanieDClark )

comments_period - integer indicating how long (in days) until comments are closed
allow_comments - boolean
display_comments - boolean
moderation_required - boolean

Example

comments: {
  allow_comments: true,
  comments_period: 14,
  display_comments: true,
  moderation_required: false,
}

Proposal: Alignment Property on Content Elements

Problem

Need a way to indicate that images (or other elements) should have text wrap around them on various sides. (Equivalent to HTML align=left or CSS float: left)

Proposed Solution

A property alignment that can appear on any content element.

Example

{
    "type": "story",
    "version": "0.5.8",
    "content_elements": [
        {
            "type": "image",
            "url": "http://foo.com/foo.jpg",
            "alignment": "left"
        },
        {
            "type": "text",
            "content": "Lorem ipsum forum fipsum"
        }
    ]
}

Proposal: Corrections should exist as both top-level field and as a content element

Problem

Sometimes you want to correct a paragraph. Sometimes you want to retract a story.

Proposal

Corrections should exist as both top-level field and as a content element.

Also, change .text to .content in corrections for consistency.

Example

{
  "type": "story",
  "version": "0.5.8",
  "headlines": {
    "basic": "Dewey Defeats Truman"
  },
  "content_elements": [
    {
      "type": "text",
      "content": "Dewey and Warren won a sweeping victory in the presidential election yesterday. The early returns showed the Republican ticket leading Truman and Barkley pretty consistently in the western and southern states" and added that "indications were that the complete returns would disclose that Dewey won the presidency by an overwhelming majority of the electoral vote"
    },
    {
      "type": "correction",
      "correction_type": "clarification",
      "content": "An earlier version of this article misspelled President-Elect Truman's name as Tramun."
   }
  ],
  "corrections": [
    {
      "type": "correction",
      "correction_type": "correction",
      "content": "The following article is not factually correct and is a fabrication by the author."
    } 
  ]
}

Proposal: denote IS-A relationship between types

Problem
Information about IS-A relations between types (schemas?) is missing. For example, story contains a collection of content_elements which suppose to be a base type for other element types, like text. But there's no formal link between text and content_element types; one can only guess it by a description.

Proposal
Denote IS-A relationship between types with "extends" keyword; for ex. text schema will have additional keyword "extends" with the reference to content_element:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "id": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/story_elements/text.json",
    "description": "A textual content element",
    
    "extends": {
        "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/utils/content_element.json"
    }

    ....
}

This will add a missing piece of information and, in addition, this property is supported by a java code-generation tool which can generate pojos out of json schemas.

Proposal: Formalize de-facto schema for label, source

Problem

Data is already ingested in the source and label fields but the spec says very little about them.

Proposal

Let's make existing usage explicit to avoid confusion. label is a dictionary-style (like headlines) set of text strings (similar to a WaPo kicker.) They may have a url field attached. source contains information about how and where the content originated.

Examples

"label": {
  "basic": {
    "text": "Sports",
    "url": "/sports"
    "display": true
  }
  ...
}

"source": {
  "source_id": "123",  // Legacy CMS Id
  "name": "Reuters",  // Organization name (of content originator)
  "source_type": "Wires",       // "Wires" for ingested content; "Original" for CMS-entered 
  "system": "Arc I/O" // Name of CMS or app
}

Proposal: improve support for tags, add categories

Problem
ANS Tags do not support wordpress categories

Proposed Solution
Add to utils/tag.json:

slug = tag or category slug, e.g. 'my-category-slug'
text = tag or category name ~~(currently listed in spec as "tag")~~
description = tag or category description
additional_properties = (same as elsewhere, used for non-standard metadata)

Example

{
_id: "DGBTWSJS2BHNTPWLVVUYPM6T6E",
  text: "My Category Name",
  additional_properties: {wordpress_type: "category"},
  slug:  "my-category-name",
  description: "My superfab category"
},
{
_id: "AMKDOSHIJ5FYVJS5UASQZ2G2CY",
text: "My Tag Name",
  additional_properties: {wordpress_type: "tag"},
slug: "my-tag-name",
description: "My unparalleled tag"
}
]

Question
Currently, tags have two properties, on the api output: _id and "tag". Am I misreading the spec when it appears that the properties should be _id and "text"?

`table-row` and `table_cell` seem to be cruft

Those elements are not allowed as children of the table element; only text nodes are valid children.

Alternative Implementation of Tables

Problem

Washington Post needs an implementation of tables that includes additional_properties on every table, row, and cell, as well as multiple content elements within a cell.

Proposal

An all-new table spec -- "objects all the way down"

Example

{
  "type": "table",

  "header": {
    "type": "table_row",

    "additional_properties": {
      "foo": "bar"
    },

    "cells": [
      {
        "type": "table_cell",

        "additional_properties": {
          "bar":"foo"
        },

        "content_elements": [
          {
            "type": "text",
            "content": "Some text",
            "additional_properties": {
              "baz": "bo"
            }
          },
          {
            "type": "text",
            "content": "And some more text"
          },
          {
            "type": "text",
            "content": "In the same cell"
          }
        ]
      },
      {
        "type": "table_cell",

        "content_elements": [
          {
            "type": "text",
            "content": "Another cell"
          }
        ]
      }
    ]
  },


  "rows": [
    {
      "type": "table_row",

      "additional_properties": {
        "foo": "bar1"
      },

      "cells": [
        {
          "type": "table_cell",

          "additional_properties": {
            "bar":"foo"
          },

          "content_elements": [
            {
              "type": "text",
              "content": "Some text",
              "additional_properties": {
                "baz": "bo"
              }
            },
            {
              "type": "text",
              "content": "And some more text"
            },
            {
              "type": "text",
              "content": "In the same cell"
            }
          ]
        },

        {
          "type": "table_cell",

          "content_elements": [
            {
              "type": "text",
              "content": "Hi"
            }
          ]
        }
      ]
    }
  ]
}

Consolidate Operations and add order, urgency fields

Batch Changes

For the purposes of implementing batching, priority updates, and a little general cleanup, I propose consolidating all main content operations (story, image, video, content) into one operation type, with new required fields "order" and "urgency"

(See http://confluence.washpost.com/display/ETT/Content+API+Batching+Proposal for details about batching.)

The simple way of thinking about these fields is that order is used to ensure correctness and urgency is used for performance.

Order represents the precedence with which this operation should be applied to its item. (It can be thought of as, "the order this update will appear in the final commit log of this item.") When resolving operations that are received out-of-order, this field will be used to resolve conflicts. Higher order items supercede lower order items. Each distinct operation for the same content id should have a unique value for order so that there are no ties. Note also that even inserts and deletes for the same content item should include this field -- so for a sequence of [insert, delete, insert, delete] the final delete operation should have an order of at least 3. (This field is similar to how date is used now, but with the additional constraint of uniqueness. Date should still be included for use in metrics, logging, human-readability, etc.)

Urgency is used to communicate that a certain update should be applied as quickly as possible, independent of the item's order field, position in a kafka queue, etc. It can be increased to indicate that a particular operation should jump the queue (e.g., if someone wants to immediately publish an update for a breaking story) or decreased to indicate that this operation can be put in a backlog (e.g., because it is part of a bulk import that must be finished "sometime.")

How downstream systems reconcile order and urgency is up to them, but one reasonable minimal implementation would be to process operations as they are received and discard those with a lower order than the one previously received for the content item in question, and simply ignoring urgency.

Misc Changes

For consistency and ease-of-use, I also suggest the following changes:

Use current content-operation type as basis for new object, with five possible operation values: insert, update, delete, publish, unpublish
Add additional_properties
Drop publish_date and display_date fields
Add Story API convenience fields (some downstream systems use these) as optional: editions, branch, revision_id
Use published to indicate a global published status for an object after an op (WebSked uses this)

Examples

Example of an image operation in the new proposed format:

{
  "type": "content-operation",
  "version": "0.5.8",
  "operation": "insert",
  "organization_id": "anglertest",
  "id": "AAAAAAAAAAAAAAAAAAAAAAAAAA",
  "order": 1,
  "date": "2016-10-24T15:46:06+00:00",
  "urgency": 10,
  "published": false,
  "additional_properties": {},

  "body": {

    "additional_properties": {
      "galleries": [],
      "originalName": "arc-icon-black.png",
      "originalUrl": "https://arc-anglerfish-arc2-prod-anglertest.s3.amazonaws.com/AAAAAAAAAAAAAAAAAAAAAAAAAA",
      "proxyUrl": "/photo/resize/JAJN4WuiYYtk-GPAxYqlYBose4A=/arc-anglerfish-arc2-prod-anglertest/AAAAAAAAAAAAAAAAAAAAAAAAAA",
      "published": false,
      "resizeUrl": "http://anglerfish-thumbor.internal.arc2.nile.works/JAJN4WuiYYtk-GPAxYqlYBose4A=/arc-anglerfish-arc2-prod-anglertest/AAAAAAAAAAAAAAAAAAAAAAAAAA",
      "roles": [],
      "version": 0
    },
    "address": {},
    "height": 144,
    "type": "image",
    "url": "https://arc-anglerfish-arc2-prod-anglertest.s3.amazonaws.com/AAAAAAAAAAAAAAAAAAAAAAAAAA",
    "version": "0.5.6",
    "width": 144
  }

}```

and a story operation:

```{
  "type": "story-operation",
  "version": "0.5.8",

  "operation": "publish",
  "id": "JWI4TJ6RO5FDZGZDDZ7ENCDVVY",
  "organization_id": "sfr",
  "order": 1,
  "date": "2016-10-17T17:18:50.337Z",
  "urgency": 10,

  "published": true,
  "editions": [
    "default"
  ],
  "branch": "default",
  "revision_id": "RXXKXBVQVNHBJCWOMMAHLJLT7U",

  "additional_properties": {},

  "body": {
    "status": "published",
    "short_url": "",
    "publish_date": "2016-06-21T18:41:00Z",
    "promo_items": {
      "basic": {
        "height": 0,
        "width": 0,
        "created_date": "2016-06-21T18:41:00Z",
        "credits": {
          "by": [
            {
              "org": "Santa Fe Reporter",
              "name": "Anson Stevens-Bollen",
              "type": "author"
            }
          ]
        },
        "url": "http:\/\/s3-us-west-1.amazonaws.com\/wp-sfr\/imgs\/media.images\/21186\/Cover-MAIN-Pride.jpg",
        "version": "0.5.8",
        "caption": "",
        "type": "image"
      }
    },
    "additional_properties": {
      "Category": "News",
      "wehaa_article_id": 12145,
      "SubCategory": "Features"
    },
    "taxonomy": {
      "tags": [],
      "keywords": [],
      "sites": [
        {
          "referent": {
            "provider": "",
            "id": "\/news\/features",
            "referent_properties": null,
            "type": "site"
          },
          "type": "reference"
        }
      ]
    },
    "language": "en",
    "description": {
      "basic": "Get ready for the rainbows, the glitter and the gyrating. Santa Fe Pride's 2016 celebrations this weekend are taking a few cues from bigger celebrations, with a five-part lineup on the Bandstand."
    },
    "subheadlines": {
      "basic": "Q&A with Santa Fe HRA's Richard Brethour-Bell"
    },
    "credits": {
      "by": [
        {
          "referent": {
            "provider": "",
            "id": "502",
            "type": "author"
          },
          "type": "reference"
        }
      ]
    },
    "display_date": "2016-06-21T18:41:00Z",
    "copyright": "SantaFeReporter 2015",
    "owner": {
      "id": "sfr"
    },
    "headlines": {
      "basic": "Pride On and Off the Plaza"
    },
    "canonical_url": "\/santafe\/article-12145-pride-on-and-off-the-plaza.html",
    "last_updated_date": "2016-10-17T17:18:50.123Z",
    "revision": {
      "branch": "default",
      "parent_id": "4PMCNZLL2VBV7NDDLLB7SO5ROM",
      "revision_id": "RXXKXBVQVNHBJCWOMMAHLJLT7U"
    },
    "created_date": "2016-06-21T18:41:00Z",
    "content_elements": [
      {
        "content": "<span class=\"drop-cap\">G<\/span>et ready for the rainbows, the glitter and the gyrating. Back downtown for the third year again after a foray to the Railyard, Santa Fe Pride's 2016 celebrations this weekend are taking a few cues from bigger celebrations, with a five-part lineup on the Bandstand. SFR  caught up with local organizer Richard Brethour-Bell, former board president for the Santa Fe Human Rights Alliance and also a regional director of InterPride, an international organization of Pride organizers that helped him visit Pride parties in other communities. <br\/>",
        "additional_properties": null,
        "type": "text",
        "_id": "NYN4FTQ3OJG4JJF5B6ST2G6RRA"
      },
      {
        "content": "<div class=\"galim\" style=\"width: 450px; float: right; margin: 10px 0px 10px 10px; border: medium none;\" title=\"\"><img height=\"300\" src=\"http:\/\/sfreporter.com\/santafe\/imgs\/media.images\/21181\/4-Pride-Plaza-MAIN.jpg\" width=\"450\"\/><div align=\"right\" class=\"caption\">Richard Brethour-Bell says that if the city\u00e2\u0080\u0099s LGBT community wants another Pride, they\u00e2\u0080\u0099re going to have to get more involved in the group that puts on the show. <\/div><div align=\"right\" class=\"Credits\">Julie Ann Grimm<\/div><\/div>",
        "type": "raw_html",
        "_id": ""
      }
    ],
    "version": "0.5.8",
    "type": "story",
    "_id": "JWI4TJ6RO5FDZGZDDZ7ENCDVVY"
  }
}

Schema

This is a breaking change and adds new required field, so would bump the ANS version to 0.5.8

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "id": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/content_operation.json",
  "description": "An operation on a content item.",
  "type": "object",
  "additionalProperties": false,
  "allOf": [{
    "properties": {
      "type": {
        "description": "Identifies this as an ANS operation",
        "type": "string",
        "enum": [ "content-operation" ]
      },
      "version": {
        "type": "string",
        "description": "The version of ANS this operation (and its payload) is written in"
      },


      "operation": {
        "type": "string",
        "description": "The identifier of the operation being performed",
        "enum": [ "insert", "update", "delete", "publish", "unpublish" ]
      },
      "id": {
        "type": "string",
        "description": "The id of the item being operated on."
      },
      "organization_id": {
        "type": "string",
        "description": "The id of the organization that the item being operated on."
      },


      "order": {
        "type": "integer",
        "description": "The order that this operation should be applied to its item. When resolving operations that are received out-of-order, this field will be used to resolve conflicts. Higher order items supercede lower order items.  See http://confluence.washpost.com/display/ETT/Content+API+Batching+Proposal"
      },
      "date": {
        "description": "When the operation should be considered performed. Ideally should align with \"order.\" Useful for displaying metrics and other human-readable interfaces.",
        "type": "string",
        "format": "date-time"
      },
      "urgency": {
        "type": "integer",
        "description": "Indicates how quickly this operation should be processed. Note that this is orthogonal to order: higher order operations should superced lower order operations. Operations with a higher urgency may be processed sooner than those with lower urgency, but will not be given any greater or lower precedence. (I.e., once all operations all received, they will be processed as if they had been received sequentially according to their order and priority will be ignored.) See http://confluence.washpost.com/display/ETT/Content+API+Batching+Proposal"
      }

      "published": {
        "type": "boolean",
        "description": "If true, then this item is regarded as published (e.g., containing at least one version in a published state) by the source system after this operation is completed."
      },
      "branch": {
        "type": "string",
        "description": "The name of the branch that this operation occurs on, if any."
      },
      "revision_id": {
        "type": "string",
        "description": "The id of the specific revision that this operation occurs on, if any."
      },
      "editions": {
        "type": "array",
        "description": "A list of identifiers of editions that are changed in this operation, if any.",
        "items": {
          "type": "string"
        }
      },


      "additional_properties": {
        "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/traits/trait_additional_properties.json"
      },
      "body": {
        "type": "object",
        "description": "The object being inserted/updated/deleted/published/unpublished. ",
        "anyOf": [
          {
            "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/story.json"
          },
          {
            "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/image.json"
          },
          {
            "$ref": "https://raw.githubusercontent.com/washingtonpost/ans-schema/master/src/main/resources/schema/ans/0.5.8/video.json"
          }
        ]
      }
    },
    "required": [ "type", "version", "operation", "id", "organization_id", "body", "order", "urgency" ]
  }]
}

Proposal: Add an Interstitial Link content element (Example)

Please add a new content element for interstitial links. These are used within the body an article to link to a specific article that may be related.

An interstitial link differs from related content in that it is placed at a specific position in the body of an article, and thus is best represented as a content element rather than as a sub-item in the related_content field.

{
  "type": "story",
  "version": "0.5.5",
  "content_elements": [
    {
      "type": "interstitial-link",
      "url": 'https://www.washingtonpost.com/politics/can-clinton-and-trump-ride-to-big-victories-in-next-weeks-acela-primary/2016/04/20/ea6454fc-064e-11e6-bdcb-0133da18418d_story.html?hpid=hp_rhp-top-table-main_5states-1025a%3Ahomepage%2Fstory",
      "content": "Can Clinton and Trump ride to big victories in next weeks Acela primary?"
    }
  ]
}

Proposal: Add Named Entities trait

Problem

Track Named Entities (things like organization names, people names, and locations) that are mentioned in a given story. This would allow people to search for stories about certain companies or peoples.

Proposal

{
  "title": "Named Entity",
  "type": "object",
  "required": [
    "_id",
    "type"
  ],
  "properties": {
    "_id": {
     "type": "string",
      "description": "The unique identifier for this Named Entity."
    },
    "type": {
      "type": "string",
      "description": "What this is a Named Entity of. E.g. Organization, Person, Location"
    },
  }
}

The above would be what the actual object would look like and would go into the utils section. It would need to go into the clavis_operation type and the taxonomy type.

I haven't finalized if _id and type will be the only fields yet. I just wanted to create this proposal now to talk about it.

Example

"named_entities": [{
  "_id": "The Washington Post",
  "type": "organization"
}, {
  "_id": "Marty Baron",
  "type": "person"
}]

Proposal: Add budget line, internal note

Problem

Websked needs to search and display fields for the budget line and internal note.

Proposal / Example

{
  "type": "story",
  "version": "0.5.8",
  "planning": {
    "internal_note": "Tweeted; Posted to Facebook",
    "budget_line": "allies killed by U.S. missile strike in Syria story 2017-04-13"
  }
}

washingtonpost / ans-schema Goto Github PK

ans-schema's People

Contributors

Stargazers

Watchers

Forkers

ans-schema's Issues

Problem

Proposal

Example (Updated)

Problem

Proposal

Example

Problem

Proposed Solution

Example

Potential concerns

Problem

Example

Problem

Proposal

Example

Problem

Proposal

Example

Problem

Proposal

Example

Problem

Proposed Solution

Example

Pros

Cons

Problem

Proposal

Example

Problem

Proposal

Example

Problem

Proposed Solution

Example

Problem

Proposal

Example

Problem

Proposal

Examples

Problem

Proposal

Example

Batch Changes

Misc Changes

Examples

Schema

Problem

Proposal

Example

Problem

Proposal / Example

Recommend Projects

Recommend Topics

Recommend Org