oarepo / rfcs Goto Github PK

RFCs for OArepo

rfcs's Introduction

OARepo RFCs

The OARepo RFC process (Request For Comments) is a communication tool inspired with Invenio RFCs with the purpose to:

coordinate the design process
document design decisions
produce consensus among OArepo stakeholders

The RFCs are not meant to be a heavy long process, but rather an agile process to aid communication between geographically dispersed teams as well as to document OARepo development so that we can avoid knowledge loss when people leave and ease knowledge transfer when people joins.

OARepo RFCs is not an official approval process which you might known from other RFC processes.

TL;DR

Request a new RFC

Quick links

Process overview

Request RFC (focus on scope): Before starting to write and implement a new RFC, you first have to request the approval of OARepo architects by opening an issue. This is to aid scoping the RFC, avoid duplication as well as save everyone time..
Write RFC (focus on content): If the request is accepted by architects, the writing process of the RFC is started using the following template. This part of the process focuses on the content. The goal is that most discussions and alignment on a solutions happens in this writing phase.
Review RFC (focus on quality): Once the RFC document is written, it needs to be reviewed and approved. The review focuses on quality of RFC and readiness for implementation, not the content of the RFC (that should already have been agreed upon in the writing phase).
Merge RFC: RFC is merged into repository (RFCs does not need to be complete to be merged, as long as unresolved questions have been listed in the RFC document and quality has passed the review).
Start implementing RFC: Create implementation issues for RFC features in repositories affected by the RFC, assign developers to be implementing these features.
Finish RFC implementation: When all implementation issues are resolved and closed, user stories from the Motivation section are checked against implementation and RFC document is then marked as Implemented.

When to write a RFC?

You need to write a RFC to make changes to OARepo modules functionality. A change could be:

Adding/removing larger features and/or modules.
Changing existing features/APIs.
Changes of design patterns, idiomatic usage or conventions.

Step 1: Request an RFC

Requestor

Open an issue.
- Document the 1) Motivation 2) Summary of proposed changes and 3) Expected resources needed

Platform architects:

Label and assign the issue
- All new RFCs should have the "Proposal: Pending" label and a label for the product if applicable (e.g. "NR").
Review the request
If rejected:
- Add a justification to the comments of the issue.
- Change the label to "Proposal: Rejected".
- Close the issue.
If accepted
- Change the label to "Proposal: Accepted".
- Create a RFC draft document in a new branch by running a new-rfc GitHub Action:
  1. Action should take issue number (e.g. 10) as user input
  2. Creates & checkouts new branch in format rfc-10
  3. Copies 0000-template.md to docs/0010-your-rfc-issue-title-slugified.md
  4. Fills the RFC document header with current date to Start Date, current issue author to Authors, and replaces <RFC title> with issue title.
  5. Creates a Pull Request from rfc-10 to main branch
  6. Fills the RFC document header with link to the created PR
  7. Links the issue with the PR
- Assign the point-of-contact architect (person responsible for drafting the RFC document)

Step 2: Write the RFC

Following is optional. It is just advices for writing the RFC in an collaborative and efficient manner:

Choose an editor (person) being responsible for this phase (e.g. the architect or another OARepo team member)
Brainstorming phase:
- Fill the template with unstructured bullet points and high-level outline.
- Try to clearly define scope - what's included, and what's excluded.
- Try to identify multiple options for solutions.
- What issues should be addressed?
- Identify possible stakeholders, and include them in the discussions.
Reading phase (moderated by editor):
- Add a "Questions" subsection to each section.
- Read the RFC and add questions/comments to the questions sections (prefix each question with <name>: ...). Be clear purpose and support it with examples.
- Purpose is to identify sections that needs further discussion.
Discussion phase (moderated by editor):
- Expect discussion on semantics, naming and scope to possibly be long discussions (i.e. take these discussions first, subsequent discussions will be much faster).
- Identify discussions points and list them in the document
- Meet live to discuss discussion points
  - Moderator takes notes as bullet points for each discussion point/question.
  - Moderator must ensure everybody is explicitly asked about their opinion.
  - Conclusion:
    - Ask for preferred solution: Once sufficient discussion has taken place, the moderator asks each person for their preferred solution. Goal is to identify if there is consensus or disagreement.
    - Propose conclusion: The moderator looks for a consensus solution an proposes this solution.
    - Ask explicitly everyone if they agree
    - If consensus is not possible, the conclusion can be TBD and perhaps needs more research, and/or ask for input from non-designated architects.
- Meet live to discuss all questions
  - Use same procedure as for discussion points.
Cleaning phase:
- Clean up the RFC document - it should be readable and coherent for a third-party which was not part of the discussions.
- Write up the summary focusing on explaining a third-party about the gist of the RFC.
Reviewing phase:
- Ask for input from the non-designated architects and other stakeholders.

You can jump around between phases.

Disagreement resolution

Fight for what you believe, but gracefully accept defeat.

Please do your outmost to not have unresolvable disagreements! The more senior your are, the more responsible you are to not have unresolvable disagreements.

In case all attempts to reach consensus have failed, and really only as a very very last resort, the architects can resolve the conflict by taking a decision. This decision should be properly documented.

Step 3: Review the RFC

Q/A Team

Comment on quality of the RFC, not the chosen solutions (this was already done in step 2)!
Can the RFC be understood by an experienced third-party that didn't participate in the discussions?
Is the RFC coherent?
Are the unresolved questions properly documented?
Set RFC status to Ready

OARepo platform Managers

Are there sufficient resources to implement the RFC?

Step 4: Merge the RFC

Platform architects:

As soon as the RFC has reached sufficient quality level and consensus it can be merged into this RFC repository. The RFC does not need to be fully completed to be merged, as long as unresolved questions have been listed in the RFC.

Step 5: Start RFC implementation

Platform architects:

Create implementation issues in repositories affected by the RFC, assign developers
Update RFC document's Implemented in: header with links to issues
Set RFC status to Being implemented

Step 6. Finish RFC implementation

When all implementation issues are resolved and closed:

Q/A Team

Check that user stories from the Motivation section are fullfilled by the implementation

Platform architects:

Set RFC status to Implemented

RFC States overview

Draft: The RFC has reached sufficient quality to be merged, some discussions has happened, but there's open questions and it's not ready yet to be implemented.
Ready: The design is ready, enough discussions has happened to reach a reasonable consensus and quality of RFC is good.
Being implemented: The RFC implementation has started.
Implemented: The RFC has been implemented in the community or code.

The OARepo RFC process owes it's initial inspiration to the Invenio RFC process.

rfcs's People

Contributors

Watchers

rfcs's Issues

[Proposal] Submission of rfcs

Motivation

We need to define a process to handle change requests for OARepo platform from multiple parties.

Summary

Request RFC (focus on scope): Before starting to write a new RFC, you first have to request the approval of OARepo architects by opening an issue. This is to aid scoping the RFC, avoid duplication as well as save everyone time..
Write RFC (focus on content): If the request is accepted by architects, you start collaborative writing of the RFC document using the template. This part of the process focuses on the content, and an architect is assigned to support you in writing the RFC. The goal is that most discussions and alignment on a solutions happens in the writing phase.
Review RFC (focus on quality): Once the RFC is complete, you submit a pull-request with the new RFC for final review. The review focuses on quality of RFC, not the content of the RFC (should already have been agreed upon in the writing phase).
Merge RFC: RFC is merged into repository (RFCs does not need to be complete to be merged, as long as unresolved questions have been listed in the RFC and quality has passed the review).

Resources

Timeline ASAP. Should be done by architects.

[Proposal] Communities backend

Motivation

Organize users and record submissions into community interest groups (similar to Invenio communities). Members of a community with elevated roles (editor, curator) should be able to manage approval proces of record submissions inside community. We should take into account synchronization of community members from Perun AAI. Each community could have a different approval process (e.g. some steps could be skipped) with each role having different permissions.

Summary

Implement a library that provides:

REST APIs to manage & fetch configured user communities
DB models to store community configuration (permissions, approval process...)
models or neccessary request classes that enables record submission to communities
synchronization tools for synchronization of community members with Perun AAI groups
admin interface

Resources

High priority - at least basic approvement workflow (skipping most of the approvement steps) needed for next planed milestone

Primary assignees: @mirekys

Implement new-rfc github workflow

Implement new-rfc github action workflow as specified in Submission of rfcs

Update README with specification from RFC0001

Update processes described in README with contents from https://github.com/oarepo/rfcs/blob/master/docs/0001-submission-of-rfcs.md

[Proposal] Loose validation

OARepo Loose validation

Motivation

For needs of processing harvested data from external sources, it is necessary to be able to store invalid records. Relevant error messages must be indexed for subsequent work to enable their aggregation and retrieval. Invalid records also need to be marked as non-valid.

Summary

Within the loosely validated models both valid and non-valid records are stored. Errors in data can be of two types:

Structural errors (wrong data type, non-existent field…)
Non-structural errors (too many characters in string, not enough values in object…)

In case of structural error

Problematic field will be erased from record and its value will be stored in field for non-valid values with information about original field. Respective error message will be stored in field for error messages with information about original field. Record will be labeled as non-valid.

In case of non-structural error

Problematic field will be stored as it is. Respective error message will be stored in field for error messages with information about original field. Record will be labeled as non-valid.

Detailed design

oarepo:validity field

Field added to top-level of JSON schema, elastic search mapping and marshmallow schema. It is used to store non-valid values and error messages.

JSON schema

{
  "oarepo:validity": {
    "type": "object",
    "properties": {
      "valid": {
        "type": "boolean"
      },
      "errors": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "path": {
              "type": "string",
            },
            "message": {
              "type": "string",
            }
          }
        }
      },
      "invalid_fields": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "path": {
              "type": "string",
            },
            "content": {
              
            }
          }
        }
      }
    }
  }
}

Elasticsearch mapping

{
  "oarepo:validity": {
    "type": "object",
    "properties": {
      "valid": {
        "type": "boolean"
      },
      "errors": {
        "type": "nested",
        "properties": {
          "path": {
            "type": "keyword"
          },
          "message": {
            "type": "keyword"
          }
        }
      },
      "invalid_fields": {
        "type": "object",
        "properties": {
          "path": {
            "type": "keyword"
          },
          "content": {
            "type": "flattened"
          }
        }
      }
    }
  }
}

Marshmallow schema

class InvalidSchema(Schema):
    path = ma_fields.String()
    content = ma_fields.Raw()

class ValiditySchema(Schema):
    valid = ma_fields.Bool(required=True)
    errors = ma_fields.List(ma_fields.Nested(ErrorsSchema))
    invalid_fields = ma_fields.List(ma_fields.Nested(InvalidSchema))

class RecordMetadataSchema(Schema):
    _validity = ma_fields.Nested(ValiditySchema(), data_key='oarepo:validity', attribute='oarepo:validity', required=True)

Record model

JSON schema

JSON schema will be as little restrictive as possible. Fields will have no constraints defined and all validation will be handled by Marshmallow. I.e.: Fields in JSON schema can only have defined their names and types (and in case of objects their property names and types). Additional properties are allowed.

Marshmallow

Marshmallow schema fields contain all defined model restrictions. Base record schema is inherited from modified Marshmallow base schema which will be defined in oarepo-loose-validity library and will provide the entire validation logic.

Error analysis

The type of error will be detected through the content of respective error message. Basic error messages are defined here. It is also possible to define own validation rules and errors. Because of that it needs to be decided how to distinguish structural errors from non-structural.
Options:

Do it by using regular expressions where primarily everything is taken as a structural error unless the error message falls within the list of specified non-structural error messages. In case of custom error messages add information that it is non-structural error (for example add suffix “Loose validation”)
Specify all non-structural errors inside application config (needs to be as regex because of errors of type “less then XY” etc.)
Something else?

Example

Data Schema

class AuthorSchema(Schema):
    first_name = ma_fields.String(validate=[ma_valid.Length(min=5, max=None)])
    last_name = ma_fields.String(validate=[ma_valid.Length(min=5, max=None)])
    
class RecordMetadataSchema(Schema):

    title = ma_fields.String(validate=[ma_valid.Length(min=5, max=10)], required=True)
    authors = ma_fields.Nested(AuthorSchema)

    _validity = ma_fields.Nested(ValiditySchema(), data_key='oarepo:validity', attribute='oarepo:validity', required=True)

Harvested data

{
  "metadata": {
    "title": "jej",
    "authors": {
      "first_name": "yxyxy",
      "last_name": "xyxyx",
      "something": "wrong"
    }
  }
}

Stored data

{
  "updated": "1970-10-19",
  "id": "hmb7c-ryf20",
  "created": "1970-10-19",
  "metadata": {
    "oarepo:validity": {
      "valid": false,
      "errors": [{"path":  "metadata.title", "message":  "Length must be between 5 and 10."}, 
                {"path":  "metadata.title.something", "message": "Unknown field."}]
      "invalid_fields": [{"path":  "metadata.title.something", "content":  "wrong"}]
    },
    "authors": {
      "last_name": "yxyxy",
      "first_name": "xyxyx"
    },
    "title": "jej"
  },
  "links": {
    "self": "/validity_example/hmb7c-ryf20"
  }
}

Diskuze

oarepo:validity pole bude jako system field - tzn nebude na úrovni metadat, ale jako samostaný atribut v tabulce
Chybové hlášky řešit pomocí loose validation obalu + generování vlastních zpráv z důvodu problému například s chybami typu "špatný format data"
Kam patří taxonomické chyby? Pokud to má správnou strukturu, tak to jde uložit - tzn nestrukturální chyba i když hodnota není ve slovníku
když chybí povinná věc, jedná se o nestrukturální chyby
oarepo:validity přejmenovat na oarepo:metadataValidity, aby bylo jasné, že se jedná o validační chyby v metadatech a ne v dokumentech
Je potřeba udělat plugin do model builderu v samostatné knihovně a připojit skrze `oarepo:use : [loose-validity]
v ui je potřeba počítat s tím, že žádná věc nemusí být vyplněna
do budoucna zavést i striktní chyby které nikdy nemohou být uloženy (například pro potřeby uživatelského formuláře)

oarepo / rfcs Goto Github PK

rfcs's Introduction

OARepo RFCs

TL;DR

Quick links

Process overview

When to write a RFC?

Step 1: Request an RFC

Requestor

Platform architects:

Step 2: Write the RFC

Step 3: Review the RFC

Step 4: Merge the RFC

Platform architects:

Step 5: Start RFC implementation

Platform architects:

Step 6. Finish RFC implementation

Platform architects:

RFC States overview

rfcs's People

Contributors

Watchers

rfcs's Issues

Motivation

Summary

Resources

Motivation

Summary

Resources

OARepo Loose validation

Motivation

Summary

In case of structural error

In case of non-structural error

Detailed design

oarepo:validity field

JSON schema

Elasticsearch mapping

Marshmallow schema

Record model

JSON schema

Marshmallow

Error analysis

Example

Data Schema

Harvested data

Stored data

Recommend Projects

Recommend Topics

Recommend Org