Git Product home page Git Product logo

agile-lab-dev / governance-decision-record Goto Github PK

View Code? Open in Web Editor NEW
10.0 8.0 1.0 43 KB

The Governance Decision Record (GDR) is a specification model for (computational) data governance policies inspired from the ADR (Architectural Decision Record).

License: Apache License 2.0

CUE 100.00%
architectural-decision-records data data-governance data-management data-management-platform data-mesh platform federated-computational-governance governance-decision-record policy-as-code

governance-decision-record's Introduction

Governance Decision Record

The Governance Decision Record (GDR) is a specification model for (computational) data governance policies inspired from ADR (Architectural Decision Record). Its goal is to enable the creation of version-controlled data governance policies that include:

  • a policy lifecycle state
  • a policy history state
  • the policy title
  • the context
  • the decision
  • the consequences and accepted trade-offs

These are basically in common with the ADR model. In this specification, that aims to perfectly tailor the Data Mesh context but can also be used for differnt data management paradigms, some more sections are added:

  • an implementation steward
  • where the policy becomes computational

Having documented and version controlled policies is also useful to enable distributed (federated) async but tracked and organized work by the governance team (federated governance team, in the storytelling of Data Mesh).

Documents are usually created when taking design decisions in the IT. Likewise the role of the ADR in software architectures, the GDR goal is to enable structured/versioned/governable federated work on a git repository that can include code (policy-as-code), thus closing the gap with the "platforms" world - where most of the governance decision must be executed or made live: in fact, GDR paired with policies as code can be directly accessed by a governance platform, thus offering the "computational" policy capability. When this capability is also orchestrated as part of a more complex lifecycle of technical assets (like self-serve provisioning for data products), then the picture is complete. Agile Lab has made this view a real thing, creating Witboost Data Mesh Boost.

Let's deep dive into each section.

Policy Lifecycle State

This can be as simple as a label tracking down the lifecycle state of a policy. Common states are:

  • DRAFT, when a policy is being developed and still needs to be formally approved, or has been submitted for approval;
  • APPROVED, when a policy has been formally approved: this makes it actionable and a reference for the overall governance;
  • REJECTED, when a policy has been formally rejected (after the approval process).

In the GDR template file, some pre-compiled web-rendered labels are provided.

Policy History State

This can be as simple as a label tracking down the history state of a policy. Common states are:

  • NEW, when a policy is created for the first time, it doesn't amend or supercede an existing one;
  • AMENDS or AMENDED, when an approved policy amends (or is amended by) another existing policy;
  • SUPERCEDES or SUPERCEDED, when a policy supercedes (or is superceded by) another existing policy;
  • DEPRECATED, when a policy ceases to be valid/applied and no other one amends or supercedes it.

NOTE: in the case of amend* and supercede* the related policy should be linked.

In the GDR template file, some pre-compiled web-rendered labels are provided.

Context

This section describes what is the context where the policy applies to (and why).

Decision

The decision the policies aims to apply.

Lifecycle

Declare what changes to the metadata (or anything else) would be considered BREAKING and what NOT BREAKING. This is important to implement automations at platform level and create a robust change management process based on trust between data producers and consumers.

Consequences and accepted trade-offs

What we accept to happen while the policy is applied including pros (improvements) and cons (impacts, rework, new accountabilities or requirements). Since there's no "universally optimal decision", the policy should also report the trade-offs the organization is going to accept with this policy, which could mean in some scenario making explicit the accumulated tech debt (a note on tech debt: this is usually hidden and hard to track. When making it explicit, it easier to measure/keep track to the overall tech debt, system quality in terms of architecture and behaviour, etc.).

Implementation Steward

Who is supposed to take care of the implementation (we talk about implementation since the policy, like in the context of Data Mesh, is supposed to become as more "computational" as possibile, thus leading to automate the data management practice, probably with the help of a backing platform). It can also be the role with the accountability to follow the application of such policy.

Where the policy becomes computational

Which are the specific points in the architecture, the platform, the system, the context, etc where this policy (and its checks, if any) are implemented so to become an automation (thus becoming "computational").

This is split into LOCAL and GLOBAL policy: while the former assess the context of a policy locally implemented/applied/verified (in the context of Data Mesh, this could be a Data Product Owner wanting to calculate and measure the Data Quality over data at rest in the DP's output ports, is specific to the context and does not affect others, like domains or DP owners), the latter is for policies globally applied (e.g. the S3 bucket provisionable for Data Product's output ports can only be in eu-central-1 AWS region).

If using a descriptive modelling languange, a metadata validation policy-as-code file can be provided (probably it will be integrated in the platform, e.g. using CUE lang for YAML).


How to make use of this policy model?

An example of usage includes:

  1. setting up a git repo
  2. (optional) installing a tool so that every contributor follows the same process (which is a good idea to document in the repo itself), e.g. adr-tools
  3. keep track of governance policies to create by leveraging the issue tracking system of the git repo, making use of all the features the issue tracking system provides (like labels, epics, etc ...)
  4. work out the policies issues, creating the related merge requests
  5. implement the policy, leveraging the template here provided
  6. provide a metadata model, example, and validation (policy-as-code) file
  7. when the policy is ready, merge it (according to the governance process) and make it executive.

An important note on points 3, 4, 5, 6, and 7: in the case of Data Mesh, the federated governance team (which include SME, Subject Matter Experts, coming from all the most meaningful units of the company like engineering, security, compliance, as well as domains' representative spokespersons) should collaborate in their own perimeters of expertise. Probably, a Federated Governance Team "core members" group (e.g. the Platform team) could take care of the final merge of the policies as in point 6, thus also acting as a final validation.

The policies can (will) evolve over time during the data platform lifecycle. In order to account and embrace the change, it's suggested to create a folder for every GDR and name the GDR (policy) file with the notation: xxxx-policy-content-or-decision.md (in case the Markdown format is used for the policy document, xxxx is a monothonically increasing id that tracks the policy's evolutions/version). Generally speaking of GDRs, multiple different GDRs (addressing different decisions of a same area of application) are supposed to cohexist within the same folder: in the case of governance policies this could lead to misunderstanding of the incremental sequence id, but still grouping into nested folders/subfolders can be used.

When evolving an existing policy, is important to take care of the policy lifecycle state, expecially when amending or superceding existing policies. By using the 1:1 ration for folder:policy, then it's straightforward to identify the most recent (and supposedly currently valid) policy for every context.

NOTE: it could be worthwile to also have a super high-level document reporting the current state of the system/company according to (and reporting) all the decisions leading to the current status quo.

Example

A pretty exhaustive example policy and related metadata + policy-as-code validation files is provided in the example folder. In this example, the specific architectural decision (a.k.a. GDR now) is provided to describe how an Output Port of type "FILES" should be defined, provisioned, configured, described, validated. The folder contains 3 files:

The GDR versioning assumes this is the first policy created to address this governance topic.

The overall vision is reported in the top level strategy file.

The policy metadata can be validated with the policy-as-code file using the CUE CLI (if installed):

cue vet example/data-mesh/data-product/output-port/files/0001-data-product-output-port-files-example.yaml example/data-mesh/data-product/output-port/files/0001-data-product-output-port-files.cue

Coming next

Future releases could include:

  • an organizational process for the governance meetings
  • a workflow to manage the policies lifecycle
  • more examples

License

The proposed approach, template, examples and policy-as-code files are shared with the community under the APACHE 2.0 LICENSE.

governance-decision-record's People

Contributors

erond avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

matteobovetti

governance-decision-record's Issues

Add example top-level document integrating the GDR and show the high-level vision

Scenario summary

The repo has a dedicated example section where some example GDR are reported

Problem statement

The single GDR reporting the details of each single decision lack providing an overall global vision about the current state of the art, according to the decisions.

Proposed solution

Add at top level in the example/ root a markdown document where the whole super high-level story is mentioned, this top level doc should report the related GDR ids for the described decisions.

Add policy-as-code file and related metadata example

After the first draft has been published, it's time to add some more meat on the grill.

The example policy can be updated according to a real use case (e.g. Financial Services), and be paired with related metadata example and policy-as-code validation file, to demonstrate how the policy can become "computational" (assuming a platform will then take care of it).

The basic policy documentation can be augmented with:

  • different states for lifecycle and history
  • license description
  • next steps

Policy template

Scenario Summary

In the context of ADRs, policies are created by implementing a well-known and structured model.

Problem Statement

If no explicit model is provided, users can create policies with their own models. Also, this could lead to unnecessary repeated work and reduce the overall clarity of the policies.

Proposed Solution

Create a template for a governance policy, according to the specification. The initial proposed format, which is well known to suit a version-controlled repository, is Markdown.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.