Git Product home page Git Product logo

sdp's Introduction

State Description Protocol

State Description Protocol is designed to allow for the serialization of the description of the current state of a computer system for the purposes of auditing, monitoring etc. It is deliberately simplistic and is designed to transmit the details of things that we don't know the importance of. For this reason it doesn't contain dedicated ways of describing files, packages etc. since it doesn't presume to know what it is describing, other than the fact that it is "state".

This has been somewhat inspired by Z Notation

Below is an example of an Item in SDP serialized as JSON for readability:

{
    "type": "dns",
    "uniqueAttribute": "name",
    "attributes": {
        "attrStruct": {
            "ips": [
                "2606:4700:4700::1001",
                "2606:4700:4700::1111",
                "1.0.0.1",
                "1.1.1.1"
            ],
            "name": "one.one.one.one"
        }
    },
    "scope": "global",
    "linkedItemQueries": [
        {
            "type": "ip",
            "query": "2606:4700:4700::1001",
            "scope": "global",
            "method": "GET"
        },
        {
            "type": "ip",
            "query": "2606:4700:4700::1111",
            "scope": "global",
            "method": "GET"
        },
        {
            "type": "ip",
            "query": "1.0.0.1",
            "scope": "global",
            "method": "GET"
        },
        {
            "type": "ip",
            "query": "1.1.1.1",
            "scope": "global",
            "method": "GET"
        }
    ]
}

Attributes

The attributes of an item convey all information about that item. The attrStruct contains string keys and values of any type as long as they are supported by google.protobuf.Struct.

Naming Convention

Attributes: It is convention that the names of attribute keys follow camelCase with the first letter lowercase. Child keys should also follow this convention unless it is returning information that has been returned directly form the underlying system in an already-structured format, in this case it is at the developer's discretion whether to use camel case or keep the existing case.

Additional Dynamic Data

Other than the methods that are are generated from the protocol buffers, we provide the following methods for convenience on all platforms. Certain libraries may provide more functionality above the methods listed below but these methods at least will be present and return consistent results across all libraries. Not however that the naming of the methods might change to reflect best-practices in a given library.

item.UniqueAttributeValue

Returns the value of whatever the Unique Attribute is for this item

item.Reference

Returns an SDP reference for the item

item.GloballyUniqueName

GloballyUniqueName Returns a string that defines the Item globally. This a combination of the following values:

  • scope
  • type
  • uniqueAttributeValue

They are concatenated with dots (.)

reference.GloballyUniqueName

Same as for Item

item.Hash

Returns a 12 character hash for the item. This is likely but not guaranteed to be unique. The hash is calculated as follows:

  • Take the SHA-1 sum of the GloballyUniqueName
  • Encode the SHA-1 binary value using base-32 with a custom encoding
    • The custom encoding is designed to ensure that all hashes are also valid variable names in DGraph and other databases. To this end the following encoding string is used: abcdefghijklmnopqrstuvwxyzABCDEF
    • The encoding is also non-padded, though this likely wont matter since we strip the end off anyway
  • Return the first 12 characters of the resulting string

reference.Hash

Same as item.Hash

Querying State

SDP is designed to be usable over a message queue infrastructure where a single request may be responded to by many, many respondents. In order to have this process work efficiently and provide a reasonable amount of feedback to the user about how the query is going and how much longer it might be expected to take, interim responses are used.

An interim response is designed to give the requester the following information:

  • How many responders are currently working on the request
  • If responders have stopped responding or whether they are just taking a long time to execute the query
  • When things have finished

The communication looks as follows:

sequenceDiagram
requester->>responder: Initial Request
Note right of responder: The initial request will include<br>the following subjects for the<br>responders to send responses on:<br>* Items<br>* InterimResponses<br>* Errors
responder->>requester: Interim Response: WORKING
Note right of responder: While the responder works<br>it will keep sending interim<br>responses so that the requester<br>knows it's still working
responder->>requester: Interim Response: WORKING
responder->>requester: Item
responder->>requester: Item
Note left of responder: time passing
responder->>requester: Interim Response: WORKING
responder->>requester: Item
responder->>requester: Final Response: DONE
Loading

The subjects upon which the responses and items should be sent are determined by the requester and sent within the request itself. The naming conventions for this are found below.

Item Uniqueness

An item is considered unique with a unique combination of:

  • Type
  • UniqueAttributeValue
  • Scope

While the UniqueAttributeValue will always be unique for a given type, this same item may exist in many scopes. AN example could be the same package installed on many servers, or the same deployment in many Kubernetes namespaces. Hence scope is required to ensure uniqueness globally.

Message Queue Topics/Subjects

When implementing SDP over a message queue (usually NATS), you should follow the below naming convention for topics/subjects. Note that the naming of subjects shouldn't influence how messages are actually handled, for example a Query that came though the subject request.all should be treated the same as one that come from request.scope.{scope}. All of the information needed for the processing of messages is contained in the message itself and the subjects are currently only used for convenience and routing.

request.all

Everything will listen on this subject for requests. Requests sent to this subject should have a scope of * and will therefore be responded to by everything. It is of course possible to send a request to this subject that has only one specific scope, but this would be incredibly wasteful of network bandwidth as the message would be relays to all consumers and then discarded by all but one.

request.scope.{scope}

All sources should listen on a subject named with the above naming conventions for all scopes that they are able to find items for. In some cases, such as agent-based sources in a physical server or VM this will likely only be one e.g. request.scope.webserver01. However some sources may be able to connect to many scopes and will therefore subscribe to one subject for each. An example could be a Kubernetes source which is able to connect to many namespaces. Its subscriptions could look like:

  • request.scope.cluster1.namespace1
  • request.scope.cluster1.namespace2

Dots are valid in scope names and should be used for logical serration as above.

query.{uuid}

All items, errors and status updates get sent over a subject named after the Query's UUID. This UUID should be in five groups separated by hyphens, in the form 8-4-4-4-12 for a total of 36 characters, with all letters being lower case e.g. bcee962c-ca60-479b-8a96-ab970d878392 See QueryResponse.

cancel.all

This subject exists to allow cancellation requests to be sent. Cancellations should be sent to this subject if the initial Query was sent to the corresponding subject: request.all

cancel.scope.{scope}

Cancellation requests for specific scopes should use this subject to send their cancellation requests

Lifecycle APIs

Some APIs drive server-side lifecycle/workflow state. This section captures their behavior.

---
title: ChangeStatus
---
stateDiagram-v2
    [*] --> UNSPECIFIED: CreateChange()
    UNSPECIFIED --> DEFINING: CalculateBlastRadius()
    DEFINING --> HAPPENING: StartChange()
    HAPPENING --> PROCESSING: EndChange()
    PROCESSING --> DONE: Server-side processing completes
    DONE --> [*]
Loading

Errors

Errors that are encountered as part of a request will be sent as QueryResult on the query.{uuid} subject. A given request may have zero or many errors, depending on the number of sources that are consulted in order to complete the request. If all sources fail, the responder will respond with a status of ERROR, however as long as some sources were able to complete, the responder will respond with COMPLETE.

It is up to the client to determine how best to surface errors to the user depending on the use case.

Sources that encountered errors will send errors on the query.{uuid} subject of type: QueryError. The structure of these errors is:

  • UUID: The UUID of the item request that caused the error
  • errorType: The error type (enum)
    • NOTFOUND: NOTFOUND means that the item was not found. This is only returned as the result of a GET request since all other requests would return an empty list instead
    • NOSCOPE: NOSCOPE means that the item was not found because we don't have access to the requested scope. This should not be interpreted as "The item doesn't exist" (as with a NOTFOUND error) but rather as "We can't tell you whether or not the item exists"
    • OTHER: This should be used of all other failure modes, such as timeouts, unexpected failures when querying state, permissions errors etc. Errors that return this type should not be cached as the error may be transient.
    • TIMEOUT: The request timed out
  • errorString: The string contents of the error
  • scope: The scope from which the error was raised
  • sourceName: The name of the source that raised the error
  • itemType: The type of item that was being queried
  • responderName: The responder which encountered the error

Building

First install the dependencies:

npm i

go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install connectrpc.com/connect/cmd/protoc-gen-connect-go@latest

Run buf through npx to generate the stubs:

npx buf generate

Note: depending on your use case, symlink or checkout the sdp-js and sdp-go repositories into the gen/ directory. See sdp/.github/workflows/update.yml for details.

sdp's People

Contributors

dylanratcliffe avatar davids-ovm avatar renovate[bot] avatar getinnocuous avatar tphoney avatar dependabot[bot] avatar

Stargazers

Igor Ivanov avatar Malthe Karbo avatar Ryan Lahfa avatar  avatar

Watchers

 avatar

sdp's Issues

Extract risks status into separate endpoint

Since we are planning to add a lot more detail to the risk calculation status, we will need an endpoint that we can poll. Currently the risks come back as part of ChangeMetadata, we should move this to its own endpoint that can be polled.

When risk calculation is in progress we will return:

  • Current Step
  • Total Steps
  • Step Description

Which can then be rendered directly in the UI or CLI. These will be flexible rather than hardcoding all the possible steps in an enum like we do with change status

Rework blast radius settings to use presets

We are planning to use presets in the UI for the blast radius, and we should make sure that the API reflects this. This will involve making the blast radius settings optional (probably) and having an enum for the preset the user has selected.

Screenshot 2024-06-26 at 2 19 24 PM Screenshot 2024-06-26 at 2 19 29 PM

This would have been a lot easier to design then build rather than build, re-build, design, re-build, but you live and you learn

Add RPCs for beginning and ending a change

Once a change has its blast radius defined, (i.e. it's in the DEFINING stage) a user should be able to start the change at any point. Since we already have all the info we need, we should be able to take the required snapshot and transition the change to the STATUS_HAPPENING state. Once the change is done we should also be able to transition to the STATUS_DONE state by doing basically the same thing again.

We need to be able to hit a very simple API for this so that we can do it with a single button in the UI, or a github action etc.

Change Statuses & Transitions

  • STATUS_UNSPECIFIED
    • I think that a change would go to DEFINING once a blast radius has been calculated. However by the same token I see no reason why you couldn't do this more than once.
    • How will people set the affectedItemsBookmarkUUID in the first place? In the UI it'd be manual in the bookmark editor. If it was parsed from Terrform we'd probably just have another service that goes from TF Plan to bookmark and returns the UUID, or sets it directly.
    • CalculateBlastRadius() Uses the affectedItemsBookmarkUUID to calculate the blast radius. If the blastRadiusBookmarkUUID is already set, it should be deleted and replaced. This is the method that should move the thing to defining
  • STATUS_DEFINING
    • StartChange() Takes the snapshot and moves to the next state
  • STATUS_HAPPENING
    • EndChange() Takes the end snapshot, moves to the next stage
  • STATUS_PROCESSING <- Potentially this will be skipped for now since there isn't really any post-processing to do, or maybe the snapshot happens during this phase...
  • STATUS_DONE

Create gateway messages that allow bookmarks to work server-side

We want the server to be able to store session state for to purposes:

  • So that if a user refreshes their browser they can get back to the same state
  • So that users can create "bookmarks" and resume from them

In order to do this the server-side needs to have an accurate view of the state on the client so that it can be reconstructed. To do this we need to add the following new capabilities:

  • Remove a request: If the user runs a request, then presses Ctrl + Z, we want to be able to remove that request from the list
  • Exclude an item: Users should be able to select an item and delete it. This means that when constructing the state again that item should be ignored and never sent to the user
  • Unexclude an item: If the user deletes an item by accident they should be able to Ctrl + Z and undo that action

Lower priority:

  • Expand an item: Rather than the user running all linked item requests for a given item and us storing those requests, we should instead store the fact that they have "expanded" it. This means that in the future if the item returns different linked item requests, they should also be expanded.
    • This probably isn't the most important thing ever. I was thinking that something like kubernetes services and the pods they are made of would be a good example of this, but I've remembered that the lined item request is a SEARCH, so the request itself doesn't change even if the pods change, just the results. This should maybe be left for now since I can't think of a super clear use case immediately

Create data type for Risk

We will need to be able to attach risks to Changes, this requires a new data type with the following stuff:

  • Title
  • Severity
  • Description (Markdown)
  • Related Items

Once this is created it will need to be added to a Change. We should add the risks, but also the status i.e.

  • Not yet calculated since we are still defining the change
  • Done
  • Change too large and risks can't be generated
  • Error
  • Skipped (plus a reason)

Add ListAppChanges method

This method should list all the changes for a given app. It will be used when looking at an app to get a list of recent changes

e.g.

  rpc ListAppChanges({app name}) returns ({list of changes});

Allow items to return "health"

Items should be able to (optionally) return information about their health. This will be used in the GUI for a bunch of things and consists of the following options:

  • Unknown
  • OK
  • Warning
  • Failed

Create GetUsage SDP definitions

As discussed in #941, the data returned from Stripe will look similar to below.

type UsageData struct {
	id       string
	cost     float64
	tiers    []*stripe.PriceTier
	quantity int64
}

Store the list of items a user can select from in onboarding

At step 3 in the onboarding a user selects their app.

Screenshot 2023-04-25 at 11 33 41 am

As part of this the frontend will also calculate a list of things that are slightly outside that app, which will we will then let the user select in the next screen.

Screenshot 2023-04-25 at 11 34 47 am

This needs to be stored somewhere in the onboarding info as it's possible that the user would leave here and come back (or refresh the page) and we don't want this data to be lost. this needs to be added to OnboardingProperties

Rename all "undo" requests

Requests that undo something should be named using the following naming convention:

undo{requestType}

For example to undo a request created with the newRequest type we should use undoNewRequest

This means the following for GatewayRequest messages:

design Query extension to linkDepth to consider link flavor

In some situations, it would be convenient to recursively query items while restricting the graph traversal based on attributes of the edge itself. Specifically whether or not the edges are "located in" links or not.

What are the specific capabilities that we really need, and how can we encode that in the Query messages?

Refactor SDP/gateway messages to have a top-level `msgID` for correlation

With the improved strictness of message-association in the sdp-go client, uncorrelated Error messages (e.g. when query.Type == "" in https://github.com/overmindtech/gateway/blob/09197e492f4cb5c99439dd33356308bf0aa46df0/gateway/request_session.go#L510-L517) have to abort all running requests, as there is no way to tag a specific query at the moment. Note that sdp-go doesn't abort either, instead hanging until the deadline!

At the same time, when sending a bunch of queries, the only way to detect all of them finishing is to wait for a GatewayResponse_Status with no active responders. This is error and race prone and does not allow for fully parallel querying either.

To solve this, add a top-level MsgID property to GatewayRequest and GatewayResponse that all clients fill in and can use on responses to multiplex responses back to the specific request that caused it. This can be a net-new addition and individual sdp protocol users (clients and servers) can start sending at processing this message id independently as long as they can handle empty IDs. Once everything is in plac, the client can make use of the new capability for finegrained request status processing.

Create change simulation RPC

We need an API that "simulates" a change. This should accept the ID of a change that is in the DEFINING state, and create fake snapshots that simulate a change. The details of what exactly should be faked are up for debate. This should probably be a streaming RPC also as it's similar to the existing Start and Stop ones

Add the ability to request file content

Problem

Secondary sources will need to be able to request the contents of files so that they can be parsed. The SDP protocol will need to have some way of handling this. Along with #16 this will require a substantial amount of though around exactly how we want the protocol to work. For example, should a file with content just be a regular item request with some sort of extra flag? Should it be just a new backend? i.e. file_content that returns the content of the file without requiring any actual changes to the protocol or should be an entirely new thing?

Large Files

What about file size? This is probably the most complex part as we want to be able to handle large files reasonably efficiently. Ideally this mechanism would never be used for large files as it doesn't really make sense as a file transfer mechanism, but on the other hand if you give customers something like this they will definitely abuse it so we need to make sure it doesn't fall over at the first sign of misuse.

Likely there won't actually be anything wrong with the protocol in terms of reading large files as it will be reading raw bytes and just sending them over the wire. The serialisation overhead should be pretty low. However what they will definitely do is store the entire file in memory in order to send it, and changing this will likely be very hard. To be fair just storing some binary data in memory isn't the end of the world as long as we don't do something stupid like try to insert it into a database or parse it with a regex...

Once Complete

Rename SDP "context" and "Find()"

Before we go too much further we need to rename some things in SDP to have them make more sense. These are:

  • context should become scope
  • Find() should become List()

This will require a huge amount of rework, in the following order:

Add the ability to run commands

Problem

We want to be able to run commands as part of a secondary source. Need to think about what the edge cases could be here? Do I just need top run the command and return the output? Is there more to it? How should errors be handled? What NATS tech should this use? It could easily use actual response inboxes for example as it's only expecting a single response...

Once Complete

Replicate the bookmarks API for snapshots

The Bookmarks and Snapshots API definitions only differ in the name and the use of IsBookmark in the database.

After finishing #45, (re)create the Snapshot API definitions from the existing Bookmark implementation to avoid these two drifting apart.

Remove old ways of calculating the blast radius

We have the following old ways of calculating the blast radius:

  • CalculateBlastRadius
  • UpdatePlannedChanges which internally calls CalculateBlastRadius

Once https://github.com/overmindtech/frontend/issues/1092 is complete, nothing will use the API endpoints anymore and we can remove them. We can also remove any references to ChangingItemsBookmark as this isn't used in the new process.

In API server this means we can also remove PlannedChangesStored from the database

Create a standard for response subjects

Current State

At the moment when a request is generated, the subjects for itemSubject, responseSubject, errorSubject and linkeditemSubject are randomly generated using the NATS codebase for generating inboxes.

In future (soon) we will building a linker which will need to have a full feed of items being discovered so that it can:

  • Execute linked item requests that have not been resolved
  • Trigger the discovery of secondary sources

There is already a mechanism for this in that all items are supposed to be transmitted on items.{context} allowing a wildcard subscription on items.> to achieve the above. However this adds complexity as the item needs to be transmitted twice, and the agent needs to remember to implement this functionality.

Solution

Create a standard for inbox naming that follows a hierarchy and allows for wildcard subscriptions without the item having to be transmitted twice. This hierarchy could be:

  • return.response.somethingr4nd0m: For Response messages, with the last part being dynamically generated
  • return.item.somthingr4nd0m: For Item messages, with the last part being dynamically generated

Note: If we are doing the above it would also be a good time to do the following:

  • Change all subjects to singular
  • Update documentation
  • Remove items.all
  • #7
  • #10

Given the above structure, a linker could simply listen on return.item.* and would get all messages. Similarly if we wanted to do some kind of large scale monitoring we could listen on return.response.* and see the state of the responses

Create bookmark summary type & endpoints

When listing bookmarks there isn't much reason to include all of the queries and excluded items, especially since this requires multiple database queries on the backend. We should change ListBookmarks to just return a summary that doesn't require joins i.e.

  • Name
  • Description
  • UUID
  • Created

After this we'll need to create tickets to implement it

Add the ability to request recurring requests

In order to have high resolution metrics for certain things I will need to be able to have recurring queries. For example you might want to watch the CPU usage of a process in graph form, but to do that you'd need to query an item many times and be sending the data to a time series database. It should be possible to make a request that recurs.

Allow ignoring cache when loading bookmark

When starting and ending a change, we should ignore the cache as the items might have changed. We need to change the LoadBookmark messages so that they can be told to ignore the cache, and they will pass that on to the SDP queries that they run

design protocol support for indicating affects/locates on linked items/edges

As discussed this morning, edges between items can come with a special flavor:
"located in/location of" edges indicate that a specific item is physically, on the network layer, logically, or otherwise located in the linked item. Examples are how a service is located on a VM, the VM is located on a specific hypervisor, which is located in a specific rack, room, data center, building, or a specific availability zone, region, VPC, account and cloud.

Compared to the "regular" (now also called "affects/affected by") edges, locating edges act as one-way barriers for transferring risk when doing changes. E.g. we expect rebooting a VM to have no risk for the surrounding hypervisor, but changes in the hypervisor might affect the VMs running on it.

Note that the direction of the link (where we discover the edge) is unrelated to the direction of the location/affects relationship.

To address this task, extend the SDP Query/Item/Edge/LinkedItems definitions to include a flavor or kind indication to be able to communicate this new edge information from the sources through the discovery process to any clients.

Move all source responses to one subject

I'm having issues with race conditions where a source will send a completion response, but the gateway will still be processing some of the items which will cause some of them to be dropped. The best way to solve this would be to have NATS send everything back to the requester on a single subject (items, responses and errors). NATS provides guarantees around ordering so this would solve our race condition

The only reason it's not like this in the first place was that I didn't know how to do oneOf in the protobuf message.

Implementing this would require:

  • Create a new wrapper message type. Consider what else should be included in this message? Any metadata? Tracing?
  • Change the docs to reflect the new naming convention for the new return subject
  • Update sdp-go to use the new protobufs, to subscribe to the correct subjects and to process responses correctly
  • Remove sleeps another other workarounds from sdp-go since we won't need them anymore
  • Change discovery libraries to send on this new subject
  • Update the permissions that are granted by api-server to ensure that sources and users can pub/sub to the new subjects
  • Update all sources and gateway to use these new libraries

Consider removing error subject

Likely it would be a lot simpler if a response could also represent an error. Especially since that way the full lifecycle of the query would be told by the responses

Refactor Bookmark/Snapshot data modelling

Pushing both machine-generated and user-supplied data into the BookmarkDescriptor (

sdp/bookmarks.proto

Lines 25 to 36 in ee80818

message BookmarkDescriptor {
// unique id to identify this bookmark
bytes UUID = 1;
// timestamp when this bookmark was created
google.protobuf.Timestamp created = 2;
// user supplied name of this bookmark
string name = 3;
// user supplied description of this bookmark
string description = 4;
// number of items in this bookmark
uint32 size = 5;
}
) makes the overall API definition brittle (requiring listing the attributes multiple times) and confusing (needing to understand which properties are optional in each usage).

Making that distinction explicit instead, by establishing a *Properties vs *Metadata pattern (

sdp/bookmarks.proto

Lines 92 to 114 in ee80818

// a complete Bookmark with user-supplied and machine-supplied values
message Bookmark {
BookmarkMetadata metadata = 1;
BookmarkProperties properties = 2;
}
// The user-editable parts of a Bookmark
message BookmarkProperties {
// user supplied name of this bookmark
string name = 1;
// user supplied description of this bookmark
string description = 2;
}
// Descriptor for a bookmark
message BookmarkMetadata {
// unique id to identify this bookmark
bytes UUID = 1;
// timestamp when this bookmark was created
google.protobuf.Timestamp created = 2;
// number of items in this bookmark
uint32 size = 3;
}
) allows flexible re-use of the properties (e.g.

sdp/bookmarks.proto

Lines 127 to 132 in ee80818

message UpdateBookmarkRequest {
// unique id to identify this bookmark
bytes UUID = 1;
// new attributes for this bookmark
BookmarkProperties properties = 2;
}
) definition while supplying the required metadata when returning data from the service (e.g.

sdp/bookmarks.proto

Lines 123 to 125 in ee80818

message ListBookmarkResponse {
repeated Bookmark bookmarks = 3;
}
).

This needs to be applied to all Bookmark and Snapshot messages.

Create pagination

Currently none of our APIs have pagination, the most obvious candidate for adding it is ListHomeChanges since this load every change every time the UI is loaded, and this is where we should start. However it would be good to have a standard way of doing it that will work for everything.

When designing the SDP for this we'll need to consider how the underlying database queries will work (TODO: @dylanratcliffe to ask about the best way to do this) and make sure that the method is appropriate for all (or most) of the other list calls.

Make caching a first-class citizen

The protocol should know about caching. This means:

  • Metadata should include information about whether or not a thing was cached
  • Requests should explicitly be able to request that the cache be ignored

Once Complete

Add request to metadata

The metadata should contain the request itself that caused this item to be generated. This would allow us to have secondary sources be triggered by items alone, since the item would contain all of the information needed to submit another item request that would form part of the same overall request, namely:

  • context
  • itemSubject
  • responseSubject
  • errorSubject

Note however that the first action on this ticket would need to be crating a method to track all of the other projects that would require modifications as a result of this.

Another substantial implication of this is that the item will no longer be able to be fully cached. The ItemRequest will need to be updated, even if you are returning an item from the cache as it always needs to be up to date in order to allow secondary sources to latch on to the details from this request. This could cause issues in various codebases and should be reviewed before any work is started.

Move from 'UUID' to 'ID'; use explicit naming for all reference fields

UUID is a pain to type and does not help in identifying what is pointed at. Instead we should have all the UUID fields be called "ID" instead (the format would still be a UUID and therefore bytes). This also means that whenever we refer to an ID field in a request we can call it AppId instead UUID which is a more helpful to figure out the relationships between objects.

Implement archiving for changes

We want changes to be able to be archived rather than deleted. Add a method that archives a change (or add a property to the update method). Also allow us to filter lists by archived or not

Allow cancellation of requests

Problem

It needs to be possible to cancel requests. If I accidentally issue a request that is much larger than I expected, or that is causing issues on the server, I want to be able to cancel it.

While I'm at it it might be worth adding timeouts as a first class citizen too as they will work on a similar principal.

This work will likely require some pretty substantial changes to the protocol... It also would be worth looking at the reliability of NATS and whether or not it guarantees message delivery. If it doesn't I'll need to be adding things like sequence numbers and this would be a good time to do that too.

Once Complete

Review SDP RPCs for change view pages after a change is complete

Once the user has started and ended the change we need to be able to show them all the nice UIs and diffs that Vasile has designed. We need to review the SDP spec to see what needs to be defined before it can be implemented.

Once it's defined, create tickets for implementation

Add tags to items

Background

Most of the services that Overmind is likely to integrate with have a concept of tags. This includes AWS, GCP, Azure and Kubertenes (labels)

Most companies use these tags to store business information, so they would be a good way to surface business-specific information to the user without actually needing to integrate with any new APIs. If we were to standardise on string keys and string values, we could also index these and allow for fast searching by tag if we needed to.

Example usage: Grouping by tag

In the UI we have had some feedback that it would be good to be able to group things by some arbitrary criteria, tags are an excellent place to start here. From research it seems that almost all companies use tags, but also most companies have no consistent naming or usage for these tags. Despite this we could still create some very nice grouping by combining multiple tags into a "group", then combining their values again if required

This is explained a bit more in this Figma file. This is just one the use cases that we could use tags for, but I think this one specifically is going to become important very quickly once the new exploration features are released

Screenshot 2023-10-03 at 5 29 14 pm

Potential Problems

  • Some AWS resources don't return tags as part of their API response, you need to make a subsequent request, this will substantially increase API hits and therefore rate limiting

Required Changes

  • Storage: Not required. Since the database stores items as binary blobs, we don't need to do anything here
  • Sources: Sources would need to be updated to support tags for each resource. For some (k8s) this will be easy since the API is consistent, for AWS this will be more work
  • Gateway: Not required. Since tags wouldn't change the way items are handled, they would simply be passed through
  • Frontend: This would require design work at the very least. Tags shouldn't be listed in the same section as attributes, and if we were to add grouping, this would require more work again. Though the frontend is where we would get most of the benefit of tags so this isn't a bad thing

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • chore(deps): lock file maintenance

Detected dependencies

github-actions
.github/workflows/update.yml
  • actions/checkout v4
  • actions/setup-go v5
  • actions/setup-node v4
  • oleksiyrudenko/gha-git-credentials v2-latest
npm
package.json
  • @bufbuild/buf ^1.15.0-1
  • @bufbuild/protoc-gen-es ^1.2.0
  • @connectrpc/protoc-gen-connect-es ^1.1.2
  • @connectrpc/protoc-gen-connect-query ^1.0.0

  • Check this box to trigger a request for Renovate to run again on this repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.