atomicdata-dev / atomic-data-docs Goto Github PK
View Code? Open in Web Editor NEWAtomic Data is a specification to make it easier to exchange data.
Home Page: https://docs.atomicdata.dev
License: MIT License
Atomic Data is a specification to make it easier to exchange data.
Home Page: https://docs.atomicdata.dev
License: MIT License
This is going to be a mess of various thoughts regarding hierarchies - sorry for the lack of structure
On desktops, we generally use folders and sub-folders for various goals:
If Atomic Data is to be just as useful as a Unix filesystem or a Google Drive, we need to find solutions for the earlier mentioned five goals, too.
So let's check out some approaches to hierarchy
This is the one we're most familiar with. Each files has a path, and only one parent.
The five goals in the intro describe where we use these paths for.
That's a lot of responsibilities for a single string, but that has some merits:
But it also causes issues:
In this approach, resources can be linked to Tags. A Tag is like a parent folder, but its a 1-n relationship. Every resource can have multiple Tags - contrary to how folders work in most systems. Google drive does use a tag-like model, though. A folder can be placed in multiple places.
Tags can be nested, like folders. Should circular tags be possible? If they are, then implementations need to be very much aware of this, to prevent getting stuck in some loop.
These tags could be easily used for authorization. If you want to find out if an agent can read something, check the tag of the resource. Then, check the agents-read-access
property (an array of Agents), and check if the requester is present in that array. If that is not the case, check the groups-read-access
property (an array of Groups), and check if the requester is present in that group.
We could use an additive rights model for authorization. Check all the tags of a resource (and all parent tags of these tags) to find out whether the user has the correct rights. If the correct rights are present in any of the tags, you're good to go.
Folders are often also used for calculating disk space. This might be a bit harder with tags - you don't want to count items with multiple tags in each parent. So how do you decide how big a tag is, in bytes? One solution is to have like a 'main' tag, which perhaps is simply the first tag. Only that one is counted.
All Atomic Data Resources that we've discussed so far have a URL as a subject.
Unfortunately, creating unique and resolvable URLs can be a bother, and sometimes not necessary.
If you've worked with RDF, this is what Blank Nodes are used for.
In Atomic Data, we have something similar: Nested Resources.
Let's use a Nested Resource in the example from the previous section:
["https://example.com/john", "https://example.com/lastName", "McLovin"]
["https://example.com/john https://example.com/employer", "https://example.com/description", "The greatest company!"]
By combining two Subject URLs into a single string, we've created a nested resource.
The Subjet of the nested resource is https://example.com/john https://example.com/employer
, including the spacebar.
So how should we deal with these in atomic_lib?
In both Db
and Store
, this would mean that we make a fundamental change to the internal model for storing data. In both, the entire store is a HashMap<String, Hashmap<String, String>>
We could change this to:
HashMap<String, Hashmap<String, StringOrHashmap>>
, where StringOrHashMap
is some Enum that is either a String or a hashmap. This will have a huge impact on the codebase, since the most used method (get_resource_string) changes. Don't think this is the way to go.
In this approach, the nested resources are stored like all other resources, except that the subject has two URLs with a spacebar. This has a couple of implications:
Commits have a set
property, which can be used for adding items to arrays.
However, this leads to sometimes unnecessarily large commits when an item is added to an array. For example, the use case of adding a paragraph to a document.
We could introduce one of mors methods to for doing this. However, keep in mind that extending commits has serious implications for all involved. They can be very complex to manage in stateful systems, such as forms in a front end library.
The insert property takes an array of anonymous resources, each of which is an insert, which contains:
push
, for which every key is a Propery that is to be appended / pushed toI believe standardizing changes in data is very important, and the docs show some (perhaps too) ambitious goals for Deltas. Some of these rely on hashing: verifiability (that a Delta is authored by a specific actor) and prevention of conflicts (by using the hash of the previous delta to make sure that the delta was made with consideration of the previous one).
This is how it appears in the docs now. A delta mutates a single Atom. These are atomic - they are as small as possible. This makes them elegant and simple, it's a flat model without any nesting. This makes serialization very simple.
One problem with this approach, is that it will lead to invalid resources, since some Classes could require multiple props.
Another problem is that it might create unnecessary overhead, like checking hashes after every single atom, if multiple are changed.
pub struct DeltaLine {
pub method: String,
pub subject: String,
pub property: String,
pub value: String,
// Who issued the changes
pub actor: String,
// A signed hash, proving the actor
pub signature: String,
// Hash of the previous state (e.g. IPFS CID), makes sure it mutates the correct state
pub prevhash: String,
}
Scoped to a single resource. This means that we can do a single hash check after processing the delta. It also seems very doable to implement transactionally. This approach also allows for a simple TPF query to fetch all deltas for some subject - which is nice for re-creating the state at any point in time.
pub struct Delta {
// The set of changes
pub lines: Vec<DeltaLine>,
// Who issued the changes
pub actor: String,
// A signed hash, proving the actor
pub signature: String,
// Hash of the previous state (e.g. IPFS CID), makes sure it mutates the correct state. Should be "init" if its the first.
pub prevhash: String,
// Subject to be changed
pub subject: String,
}
/// Individual change to a resource. Unvalidated.
pub struct DeltaLine {
pub method: String,
pub property: String,
pub value: String,
}
Instead of using DeltaLines (triples with a method, property, value), it's also possible to create one or multiple nested resources. For example, we can create an insert
property, with a nesteed Resource as a value. This nested resource will contain all the prop-value combinations that need to be inserted. If we need to delete some fields, we can do the same with a remove
property, which again has a nested resource inside of it. Later, this can be extended. For example, we could have a 'changeSubject' property, which has an Atomic URL as resource.
pub struct Delta {
// Who issued the changes
pub actor: String,
// A signed hash, proving the actor
pub signature: String,
// Properties to be inserted
pub insert: HashMap<String, String>
// Properties to be deleted
pub delete: Vec<String>
}
A delta is similar to the resource that is being modified. It contains all the props and values set during the modification. Existing keys will be overwritten.
Some props will be metadata (signature, subject, hash, actor). To explicitly ignore these props, it is useful to add a deltaProps
prop which is an array of propUrls to be ignored when applying the deta.
This is very much an 'insert first' approach - as it seems harder to apply other kinds of methods (e.g. delete
).
A set of changes to any resource. This is similar to how linked-delta works, except with the addition of Actor and Signature.
This approach makes it hard / impossible to check if the requesting system is aware of the previous state. It als o
pub struct Delta {
// The set of changes
pub lines: Vec<DeltaLine>,
// Who issued the changes
pub actor: String,
// A signed hash, proving the actor
pub signature: String,
}
/// Individual change to a resource. Unvalidated.
pub struct DeltaLine {
pub method: String,
pub subject: String,
pub property: String,
pub value: String,
}
example.com/commits/QwSomeSHA256Hash
.The Agent model is designed to be a publicly accessible and verifiable (decentralized) identity that can be re-used by various apps. However, re-using an identity costs privacy. Of course, users could create new identities to deal with this, but ideally, these users should be able to (if they want) prove that they are made by some specific user.
How to achieve this?
Basically, have an anonymous signature that proves the parent Agent has signed it.
However, this does mean a signficant attack vector: simply try the public keys of agents that you suspect.
There must be a better way.
Way more elegant, but this allows the sub-agent to lie about who the parent is.
Still not ideal.
This is cryptographically sound and actually proves they are the same.
First I felt like Agent was a bit more fitting, as it would also entail non-human actors. However, Agent feels unconventional and unfamiliarm compared to 'user', even though it is more technically correct. Changing this name has quite an impact on URLs, documentation, and various implementations. Best to do it soon.
Currently, Commits have some serious ambiguity. They do not specify if they are editing a resource, or if they are creating a new one. This can lead to accidental overwrites of existing data. How to solve this?
Some resources function like an endpoint: they accept (optional) query parameters which modify the response. A Collection, for example, might use a page
parameter and a sortedBy
parameter. These available query parameters and their datatypes might be standardized.
How should the client know if a certain resource has query parameters available? How do we communicate this?
One approach is to make all query parameters available as nested resources in an array, under some new property. Each parameter needs a shortname, a datatype (perhaps default to string), bool whether its optional...
We could also have to property URLs: requiredParams
and optionalParams
.
Users need to store their agent's secret (which includes the private key) someplace safe, such as a password manager. However, this is still not optimal:
But still, I like the simplicity and the decentralized nature of the current authentication / authorization system.
One way to solve these issues (and some more) is to introduce a Companion App.
This is a native app for smartphones that is responsible for storing the secret, signing commits, and granting other permissions.
Step 1 feels trivial, but step 2 is still kind of mystifying.
Currently, Atomic Data sets classes for a Resource using the is-a
Property. This supports multiple values, which means that Resources can be instances of multiple classes.
Classes are mainly used for:
However, I'm having some doubts on supporting a multi-class model.
In other words, I'm considering using an AtomicURL instead of a ResourceArray for the datatype of the isA property.
Let's use this issue to consider the merits / downsides of having multi-class support.
Imagine wanting to describe Jay-Z.
He's a person, so you might want to show properies like his first name
, gender
, birthplace
, etc.
However, he's also a musical act, with its own discography
, labels
and genres
properties.
We might say Jay-Z is just one Subject, with one URL, which has two classes: Human
and MusicAct
.
That is the multi-class approach.
A different approach, is to have a URL for Jay-Z the person, and a separate URL for Jay-Z the music act.
Having multiple classes makes things harder. For example, implementing a Form that combines all the Required / Recommended properties..
Also, when rendering a View for a resource, things become more complicated if there are multiple Views available - Imagine rendering something that's both a Calendar item and a Person - which view do you want?
IPFS (or other content-addressing protocols) is a very interesting technology, especially for linked data, as it helps make static resources highly available. Atomic Data Properties are examples of where this is very important: it is essential that these resolve, and it can be harmful is the owner decides to change a datatype, for example.
Relates to #64
One of the core concepts of Atomic Data is that users sign their changes using their identity. This allows for fully traceable, verifiable information, which is one of the core features of atomic data. However, this also poses a challenge: how do you manage the private key? How do you deal with users forgetting their key, how do you make sure that the key is stored safely in apps? In this issue, we'll discuss various topics related to key management of Agents.
Client apps need the key to sign Commits. We don't want to bother users by entering their key on every single commit, but we do want to protect them.
In the current implementation of atomic-data-browser, the private key is simply stored in the front-end app, which is not great for security (see issue).
If you don't know your secret (containing the private key), you can't log in anywhere or sign anything. Since an Agent is stored on some Server, the server's admin can always change the Agent's public key. This process should be facilitated through software, maybe even through a standardized endpoint. It might be a good idea to, as a backup, add your e-mail to your agent.
At this point in time, secrets are very long strings that are practically impossible to remember. This means users have to use some form of key-manager (e.g. bitwarden / lastpass / browser password store), which is a bit inaccessible for some. We could use some form of seed that is easier to remember, but this still needs a nice bit of randomness / entropy.
The BIP39 spec uses an set of words such as welcome bar control expand desk wonder naive stove sight human furnace arrow ill exclude govern
.
If a user enters their secret and the app needs to store that secret to sign items, we will always have a risk of leaking that secret somewhere. If we scope keys to sessions, we can reduce this risk. We could let users enter their secret to prove their identity to the server, and sign some nonce from the server. This signed nonce could be used as the seed for a new session scoped private key, which signs the actual commits. In theory, Commits signed by these derived keys would still be traceable to the Agent.
Suggestion by Thom (@fletcher91).
The current design of Atomic Data requires that Properties specify a Datatype. That way, a triple could be parsed correctly by resolving the Property. However, this also means that these Properties can be resolved in order to properly parse the data. In practice, this would mean that Properties might be included by the server or cached by the client.
Alternatively, a datatype could be included in the serialized representation, similar to how this works in RDF.
Thom suggested using data URI schemes:
data:[<media type>][;base64],<data>
In an Atom, they would look like this:
["https://example.com/john","https://example.com/name","data:primitve/string;utf8,John"]
This would also mean that a primitive
MIME type would be introduced for some of the fundamental core models, such as integer
, string
and datetime
.
The RDF terminology subject
predicate
object
is kind of unconventional in the computer world, and can confuse newcomers. It semantically makes sense, but since Atomic Data has subject-predicate
uniqueness, it does not longer view these three rows as semantically relevant.
Perhaps we should call them something else.
I'm pretty sure about Property (since this should always be an Atomic Property) and Value, but I'm not sure about the subject
replacement.
Many values are generated not by users, but by some system. Think about things like counts, createdAt dates, and paginated lists.
Whether something is generated by a machine at runtime or not can be relevant for users of the data. For example, when you want to render an edit form. Should I be able to edit the 'editedBy' or 'editedAt' field? Or manually update the amount of comments something has? I'd rather hide these fields for the user doing the editing.
So where does this information come from? How does the client know that some fields can be edited, and some fields should not be edited?
One solution, is to have an optional property on Properties, which is a boolean that defaults to false. If it's true, the value should be considered dynamically generated or computed.
Currently, there only exists one implementation of an Atomic Data Server. However, in the future, there might be different, and each one might have different features enabled.
Most of these features are described as endpoints, such as a path
endpoint, or an all-versions
endpoint. Ideally, the client would check this collection, and depending on these, will render different action in a context menu.
But how would the client know where to find these endpoints?
Some ideas:
Atomic Paths can make it easy to traverse a graph. We could have a common starting point (the drive? the root URL of the server? something else?) that describes these.
Providing a standard for language strings (like RDF does) has some great benefits. For example, it allows for smart clients to show the right translation.
But how should it be modeled? There are at least a couple of options, and each one has some serious benefits and drawbacks:
This is basically what RDF does - add a separate field in every single statement. This solves the issue, but adding a column to the Core model (of Subject, Property, Value) is very costly in many regards. The "triple" suddenly becomes a "quad" - and every part of the ecosystem has to explicitly deal with that. Serialization formats, libraries... Most importantly, the mental model becomes more complex. In Atomic Data, it also collides with the Subject Property
uniqueness - how would you add two translated strings for one S P
combination? You'd have to replace S P
uniqueness with S P L
uniqueness, again making everything more complex. It also makes translation-heavy resources very large.
This basically means - create some custom datatype with some custom parsing. Again, every single library has to deal with this. Even if we choose something simple (e.g. a JSON array with objects containing lang
and text
tags), we still require all Atomic Data parsers to also have some JSON parser, and implement some custom logic.
Another downside, is that this doesn't play nice with Atomic Mutations - it would be impossible to add a single translation, you'd have to replace the entire Value.
Tempting, but no. Not offering a default go-to solution in this book will probably lead to a fragmented landscape, incompatible formats and a lot of frustration.
This actually makes a lot of sense, and does not require any weird parsing tricks. However, it requires clients to create these resources (and their respective identifiers) which can be a hassle. It also requires a model in between to provide the collections of translation resources themselves (like an array?), and that poses a new challenge: how do we make sure that the client is not required to fetch and parse every single translation, if its only interested in a single translation? Which brings us to...
We introduce a class for Translations, and create a property for every single language.
Similar to the method above, this does not require weird parsing tricks.
It creates far less Resources than the method above, which is also nice.
The resulting resource could get quite big, though, and clients need to fetch every single one.
Combined with the Atomic Data Shortnames, it would offer some cool and clean query options:
harryPotter1.title.en
=> Harry Potter and the Philosopher's Stone
harryPotter1.title.nl
=> Harry Potter en de Steen der Wijzen
translation
, not the actual resource above ituseLocalString
hook, which will know to fetch the URL of the linked Translation and render the locale variantAll in all, this final option seems like the best for now, but if I'm missing some options or important insights - let me know below!
People will follow these links and get 404s, and that limits how fast people will understand how this all works.
So, we have at least few options on making the URLs resolve:
Any other ideas?
The #23 Invites model allows for granting some Agent read or write rights to an existing resource. But what if you want to invite people to create something new? One usecase for this is surveys: #32
If we want to create a new thing, we need to know:
But we could also say: just invite people to a parent that contains this information (such as the class). A possible approach to this is constrained collections: #37
Tools like Airtable and Notion have re-defined how we think about tables. With these tools, users can:
In Atomic Data, we already have Collections
that can be shown in a Table component (in Atomic Data Browser). However, Tables
could offer some extra features:
Class
as a child, though. We could also only constrain members instead of children./new
or something. Because we have Class constraints, we don't need to use long URLs in properties. Also, we could scope tokens / authorization to this collection, and remove the need for Commit
s.Views
. A view contains information about filter / sort / displayed properties. This is what airtable and notion do.Collections
which often just query over existing items. Maybe this means it always filters data in some particular way.More thoughts:
Collections
seems to insinuate some form of ownership, so maybe these should be renamed to Queries
(thanks @Polleps)Class
? I'd assume so. And is this Class
a child
of the Table
? This would not work if you constrain the children of a table.The Collection Class already allows for filtering by property / value, sorting on any property.
Maybe adding fields for view
(e.g. grid, or some custom thing) makes sense?
But how should we indicate that it filters by parent? We could introduce a parent
filter, which is an optional resource that performs a filter in the Query.
The Table
:
class
the required class, containing the required and recommended propertiesview
the default view (e.g. table / grid / list). Maybe we also have a list of availableViews
members
the list of child resources. We could only show items here that are both children as well as fitting the shape of the table.new
endpoint that allows posting a new resource to this table.The View
:
properties
order or properties shown in header (question: how does this relate to properties in the class? Must they be the same? What if there are more properties in the Table, does that mean these are required?)sort
, filter
, pagination
... all the collection stuffLet's say I want to add a new or an existing item to a table, such as a Comment to some ChatBox. How do I do this?
We probably don't just want to determine who can read and edit tables, but also who can add items to it. I think we'll need a new right. #96
We could set the parent
of the Comment to the ChatBox
's Table.
No extra properties required, because parent
is always required. Seems clean!
But what if you don't want the one controlling the Table to control all its items? For example, if you start a Thread
you might not want to give the creator of the thread edit rights to all resources.
Should we allow users
One of the core ideas in Atomic Data, is that the Property field (e.g. "example.com/birthdate") in an Atom should resolve to a Property Class.
This Property will tell something about:
dot.syntax
) (e.g "birthDate")This is what gives Atomic Data typed data and shortnames, which enables for ORM type syntax (thing.property
) with type safety. I think these things have proven to be very useful for developers.
This also means that the DataType is tightly coupled to the Property, and could therefore be omitted from serialization (contrary to with RDF). Doing so, however, requires clients to dereference unknown Properties, and maintain some cache of Properties locally. It also means that when the Properties cannot be retrieved (e.g. server of Property is offline), and the client does not have a Schema Complete stored, the dataType is not known. This is a downside of Storing the Datatype somewhere else.
So here's the consideration: should Atomic Data, by default, include datatypes in serialized representations? Should it be optional, or maybe required?
At https://docs.atomicdata.dev/schema/translations.html says to use URL template https://atomicdata.dev/languages/{langguageTag}
, but the example uses URL template https://atomicdata.dev/lang/{langguageTag}
.
One of the things that Atomic Data enables, is rendering Forms for data that you've just encountered. The client will be able to fetch the Properties and Classes, and will be able to determine which HTML input fields should be rendered. However, some of these properties may not be editable. Some fields are generated by the server at Runtime, such as the members
property in a Collection. This value is dependent on the filters being set in the Collection, or the query params that are being passed. The current implementation shows these fields
How to deal with this?
generates
property to classes, next to requires
and recommends
So at this moment, Classes are responsible for generating most of the form. They define which properties are required and recommended. It therefore seems logical to also add a generates
property, which tells the front-end that it doesn't need to show these fields in a form, because editing them doesn't makes sense.
However, this means that the class becomes harder to re-use in a context that has different considerations. (see next item for example).
isGenerated
property to PropertiesInstead of making Classes responsible for providing this information, we could let Properties describe which item is generated.
However, this would mean that re-using a property becomes harder. Say we have two servers, and server 1 generates thefullName
property, while server 2 has it stored as a literal, editable value. They would need to use different Properties, which would make their data harder to combine!
In Atomic Commits, every change is signed by some author. This makes Commits truly atomic, and means that they can be shared as fully verifiable pieces of information, similar to W3C Verifiable Credentials (although Atomic Commits are specifically made to describe changes instead of current state).
However, this requires some implementation. Here's what I'm thinking:
I absolutely love saas services that provide a one-click collaboration.
Example from HackMD:
I think we could have this same feature with Atomic Data. Standardizing how this works could help to make this something that all apps could get for free - without burdening devs with implementation details.
This means the front-end should check the URL for a guestKey
, and the server should make sure it sets the correct grants for the selected resource(s).
Front-end reads query param that contains token. Front-end generates keypair, sends public key to back-end token service, which adds the newly created agent (pubkey) to the allowed posters to some scope.
Invites are a new Class that have some fields:
Target resource: Where the invite points to
Usage limit: If it is a multi-use or single-use invite token?
Target Agents: An optional set of agents that are allowed to use the Invite?
Allows for short, query param free easy to read URLs
Less clear what the identifier is of the actual resource being edited
When the client has fetches an Invite resource, it should know that it can post an Agent Subject (or public key?) to the Server, after which it gains the rights to edit the resource. Viewing if another question.
Atomic Data allows for multi-class resources. This means that we could have a resource with both a regular class (e.g. document) and an Invite class.
I just had a commit rejected on my server because the client made a commit that had a CreatedAt timestamp greater than the Now time of the server.
Now I'm not so sure if this check should occur. Of course, I can add some accetable difference, which would probably resolve this issue, but it would still cause issues down the line when computer clocks are set incorrectly - either on the server or on the client.
Currenlty, I use Triple Pattern Fragments in Atomic for all my query needs. Combined with the Atomic Collections abstraction, it works pretty well for basic things: listing all instances of some class, sorting these items...
But this is still kind of limited. Most query languages allow for way more powerful kinds of queries.
Let's say I want to find all Users who are friends with John. And let's assume that the friendship relation is not a direct one, but is a Friendship class in between Persons. This happens when a relationship itself requires properties (e.g. how intense is the friendship, when was it started, etc.).
Person -> hasfriendship -> Friendship -> friendsWith -> John
.
So how do I find all the Persons with a hasFriendShip relation to a Frienship with a friendsWith relation to John?
Perhaps we can use the Atomic Paths concept here.
We could describe the question as? hasFriendship friendsWith John
.
To be continued...
In the current implementation of atomic-server
we can find all versions of a resource by visiting the /all-versions
endpoint. It works pretty well, but it is not discoverable. Also, it requires performing a search query on the back-end. This can be optimized, of course, but having an explicit link to a resource is more elegant and will also be performant to others.
One alternative way of doing things, is adding a previous-commit
property to every resource after applying the commit. With this property, it becomes easy to find the commit, and with that, also the last editor, the edited date.. And if these commits also link to previous commits, we can quickly find the first commit!
I'm also a big fan of RDF, but also recognizing some limitations or trade offs between complexity and utility
So I was playing around with this:
Just leaving it here as food for thought!
Feedback welcome!
Atomic Data is a working title, and it might change. I like the way it sounds, how it refers to the smallest possible amount of data (indivisible, hence atomic), and how you can use the prefix Atomic to refer to specific elements (e.g. Atomic Mutations, Atomic Paths), but:
So let's take a moment to consider some alternatives.
I'd like to turn these:
https://atomicdata.dev/path?path=https%3A%2F%2Fatomicdata.dev%2F+https%3A%2F%2Fatomicdata.dev%2FfavMovies
https://atomicdata.dev/path?path=https%3A%2F%2Fatomicdata.dev%2Fagents%2F0XzHfi3he5xzUGAEpwg3H5PNlIrpHMrdQRV8oFqU9Fs%3D+name
Into something like these:
https://atomicdata.dev/_https://atomicdata.dev/favMovies
https://atomicdata.dev/agents/0XzHfi3he5xzUGAEpwg3H5PNlIrpHMrdQRV8oFqU9Fs=_name
What is the best separation character? Ideally something that rarely occurs in a URL, or else we need to parse escaped URI escaped URLs.
(spacebar) is really clean, but will not be recognized by most linters / parsers as a URL, e.g. in a markdown document it would not appear as a clickable link. We could of course accept %20 (the URL encoded spacebar), but that would be hard to read, which defeats the purpose_
underscore is very human readable, and often means "spacebar".+
is cool, but is also a base64 charThere are many use cases for verifiable credentials.
One of the core features of Atomic Data is the Commit model, which makes data highly traceable. However, making sure that one specific value is 'accredited' by a specific individual is kind of a bothersome process: get a resource, find the commit which updated some specific field, get the agent (which inclludes the public key), check if the public key matches the property.
A single Verifiable Credential contains a Claim(s), some Proof(s) and some metadata.
Generally, I think there are two ways of thinking about credentials
. The first one is to think of credentials as just Resources with their own properties. This approach is the most familiar - just take the W3C VC model, create some atomic properties, and we're good to go.
But this approach introduces a few difficult problems:
birthdate
property in my profile? Should we convert all Credentials to regular property-values?Atomic Commits are, in essence, all signed credentials. There is a date, an author, a signature, a subject, and a (set of) properti(es). This means all Atomic Data created using Commits is entirely verifiable!
So we don't have to invent anything new, right?
Well, with Commits we've tackled an important part of the problem already, but the next step is discoverability.
How would you know that a specific property is actually a proven, verified one, instead of something that I just made up?
We'll need a way of finding the Commit. We could use an Endpoint for that.
One way of being able to finding credentials (the commit) for a certain atom, is by having a /verifiable-check?path="thing property"
endpoint which takes an Atomic Path and returns a collection of Credentials.
For example, I might try to find a signed bachelor's degree by a university by visiting /verifiable-check?path="profile bacherlors-degree
.
Maybe we could also filter by value.
It would return the Commit(s) that match that subject / property / value combination.
The client can then verify the signature, and check the set
value to verify / validate the commit.
Resources move, change and get deleted. This can be problematic when people depend on these Resources. Atomic Data is largely designed to re-use external content all the time, which means that these external dependencies become more important. For example, when a Commit is parsed and validated, the Properties mentioned in the commit will need to be available (either cached, or fetched). When these are unavailable, the Commit will fail.
A partial solution to this problem is using IPFS #42, which helps make resources immutable. However, this still does not fix the updates itself.
I think that we need two things:
Calculating a collection can be an expensive endeavor. So ideally, we'd use caching to prevent doing these expensive calulations.
One way of approaching this is by keeping track of the collections in which a resource is used. When the resource changes, we can 'invalidate' the cache. This could mean that we have a is-invalid
property on cached collections which is set to true whenever an item holding a foreign-key is updated. The function generating the collection can then check for this property, and if it is true, it can skip the expensive step of filtering all the existing resources.
This approach, however, would miss new resources, or resources that first didn't match, but after some commit do match. In other words, when a new resource is added, it will not invalidate a cache, even if it should. For example, the new todos
collection would not be invalidated when a new todo
is added. What we could do, is for every commit, run all stored collections, and see if they match for that specific resource. Which... Kind of defeats the purpose of using foreign keys for this at all :').
Also, whenever the filters change in a Collection, this should mean that the Collections should be invalidated.
/subscribe
endpoint.resource
(the resource to which is subscribed), commitEndpoint
(the /commit
endpoint to which the changes should be sent. Maybe we could later add subscription levels (e.g. 'delete only').Although the current signature spec + implementations (server, client) works, it is very much custom and not well described. I'm still kind of new to all this crypto stuff, so I just picked a proper algorithm and defined a canonical JSON serialization to make both the client and server (which use different libraries) reach consensus on signatures. I didn't know I could just use all the existing JWT / JWE / JWS tooling...
Anyway, this needs to be reflected in the spec, and in both implementations. Gonna take quite a bit of time, but it's the right decision.
At https://atomicdata.dev/datatypes/string it says
Allows newlines with .
Source reveals that it is talking about a newline surrounded by backticks.
Seems the description field should escape those backticks to avoid them being parsed as a Commonmark code block.
Alternatively, perhaps this is a bug in the Atomic Data Browser that it parses plaintext description as Commonmark?
I was working on creating tooling for Versioning for Atomic Data. I want to make this functionality discoverable for users, for example through a menu or a button on the resource. But... Not all atomic data servers will have this feature. And even if they do, how would the front-end app know?
In this case, we could say that all /commits
endpoints should point to the commits
endpoint. However, this would severely limit implementations and users in their choices for endpoint names, and it would make extending endpoint functionality impossible to do consistently. Not good.
Paths can be really useful to provide a predictable way to find specific things on a server.
For example, finding the first name of the owner of a server might look like profile first-name
.
Similarly, the path to the commits
service might be found with endpoints commits
.
Now, the resource resolving to that path may be available on example.com/commits
, but this is not required.
The resolve mechanism traverses resources, instead of using a predictable URL.
So, the front-end requests the root resource (the Drive
), checks the endpoint
resource.
When the required endpoint is there, the front-end will show the action.
Currenlty, Shortnames only allow downcased characters and dashes. However, most properties currently use snakeCase
.
For example, the https://atomicdata.dev/properties/isA
has a path ending of isA
but a shortname of isa
, which I considered replacing with is-a
.
I think a better way to go is to use kebab-case
everywhere.
I want to be able to:
When you want to notify some server of a change, simply send the commit there. Incoming messages or notifications are not anything special, they are simply commits.
This however does not solve filtering and following
I suspect that every person working with RDF will, at one point, ask themselves: so how do I store ordered data? Some time ago I wrote a blog about this question, and it still feels... too complex. Why not just introduce a serialized Array datatype? Parse the object string as a JSON array, each value containing a URL.
Of course, this does not replace linked lists, which have their own merits (e.g. decentralized lists, which can be fun in games where everybody passes something along), and we still need to deal with very long lists (which require pagination).
JAMstack is a way to manage websites, where you Serve static (HTML) files, probably using a CDN, and re-build these websites automatically when data changes. This helps makes apps fast and easy to manage. It often involves not managing a server.
Atomic Data could be really useful in this context:
We'll probably need some new tools / libraries / tutorials / templates for convincing people to use Atomic Data in a JAMstack app:
@tomic/react
, so all that remains is writing a tutorial or boilerplate.It would be cool if the docs.atomicdata.dev
would be running using this stack.
The AtomicData Triples (AD3) format (described here) is inspired by hextuples - which is designed to be more performant than other existing RDF serialization formats. But I'm not entirely sold on the idea of having that as the standard way to serialize Atomic Data:
thing.property.otherproperty[5]
syntax (i.e. it's not json)Atomic Data, however, is bit different from RDF. For one, Subject-property uniqueness means that we can use key-value stores and plain JSON objects without having key collisions. However, like RDF, Atomic Data uses URLs for keys (Properties / Predicates). Because these URLs are prone to typos and take too long to type, Atomic Data introduces the Shortname property, which means that these shortnames could be used as keys in serialization formats. This is what I've used in Atomic-Server for JSON-LD serialization - the keys are nice and short, while the long URLs are available in the @context
object. This gives regular JSON users a familiar ORM-style syntax, whilst retaining a way to find out more about the properties.
But... Parsing JSON-LD is slow, if you want to actually use the linked data URLs and parse it as RDF. Using it as JSON with some embedded documentation is just fine, though.
So let's explore some serialization ideas. I think taking JSON as a strating point makes a ton of sense. It has awesome support, it has highly performant optimized parsers, and developers are familiar with it.
{
"https://example.com/someResource": {
"https://example.com/somePropString": "someval",
"https://example.com/somePropBool": true,
"https://example.com/somePropThatLinksToANestedResource": {
"https://example.com/somePropString": "some nested value",
}
}
}
@id
@id
objects, all lists need @list
objects... Can't say I like that. The alternative, adding an @context
object, also seems like a suboptimal way of doing things.The page section https://docs.atomicdata.dev/interoperability/rdf.html#why-these-changes ends like this:
I've asked two colleagues working on RDF about this constraint, and both were critical. The reason
The reason what?
Great text - which makes it so much more frustrating to leave the reader hanging in suspense over that detail... :-)
I've currently opted for a Unix Epoch integer for datetime representation, since it's simple to parse and leaves little room for error.
However, it's not human readable. ISO 8601 on the other hand, is.
Some more thoughts on the matter here.
In the Atomic-Data-Browser app, I use Classes, Properties and Datatypes to render the form and validate its inputs. The Class dictates which required and optional properties are used, the properties determine the labels, and the datatypes determine the type of form input. A normal string is a simple single line input, a markdown field a multiline one (perhaps with some UI elements for markdown syntax), a ResourceArray a more complex field with ordering, and so forth. And then, there is possibly some validation. A Slug, for example, checks if a string consists of only dashes, numbers and letters.
The problem is, I feel like sometimes the property should do validation, and not the datatype. If only datatypes do validation, we might get a lot of datatypes. I'm not sure if that is a bad thing, but it might be.
I think we have (at least) two options:
regex
fieldregex
fieldLet's consider some (not yet existing) properties that might require validation. Think about the form field, the validation, and whether the datatype could be re-usable in various contexts.
An international phone number, such as +31123456789. Seems only relevant in a 'phonenumber' property.
A 32 bit base-64 serialized ed25519 public key. Don't think this is usable in any other property. Key can be validated with some JS - regex won't suffice.
A hexadecimal color value, e.g.EEFF22
. Could be usable in many semantic contexts, such as 'backgroundColor' and 'buttonCollor'. Should offer a colorpicker, probably.
When describing concepts in physics, large numbers like10^36
are pretty common. These could break normal u64
ints, so they cannot be integers.
We can come across these in many properties, such as length
in meters or count
of a molar mass.
I love git
: it enables cloning a repo, making changes and giving these changes back. That's an incredibly powerful feature to have. Atomic Data already has Commits, and with this Cloning feature there's also Forking. Should it also need a way to merge changes and make suggestions? And should this use Cloning, or is Forking too different?
Let's assume a user wants to improve some piece of data on a webpage - let's call him the changer. Say the local grocery store has an issue in its 'open times' during covid, and a customer wants to edit this. From the perspective of the changer, they could click the data they want to change, make a change, and click 'share suggestion'. What might happen under the hood to enable this?
I've been trying out a couple of one-liner descriptions for atomic data, but I haven't really landed on something that feels entirely right.
Some considerations:
we don't have an enum
datatype, but users have indicated that they want to constrain inputs to specific sets.
I think we should add an optional property to the Property class
Surveys are an interesting case for web applications and frameworks. How could surveys work in the Atomic Data paradigm (using concepts like Commits and Agents)?
Things that I want, as a survey creator:
And as a respondent, I want:
So, how to implement this? Here's some thoughts:
enum
like datatype, yet #27, which would be useful for multiple choice questionswrite
ResourceArray.used: true
propval.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.