consensus-shipyard / ipc Goto Github PK

🌳 Spawn multi-level trees of customized, scalable, EVM-compatible networks with IPC. L2++ powered by FVM, Wasm, libp2p, IPFS/IPLD, and CometBFT.

Home Page: https://ipc.space

License: Apache License 2.0

Shell 0.99% Makefile 0.91% Rust 69.98% Solidity 26.20% TypeScript 1.46% JavaScript 0.09% Python 0.15% Dockerfile 0.21% Gnuplot 0.01%

blockchain consensus evm wasm

ipc's People

Contributors

Stargazers

Watchers

ipc's Issues

ABCI Interface

Create an ABCI interface to process requests coming from Tendermint.

There are two examples of this:

tendermint-rs/abci is a synchronous version but currently only works with v0.34 because of a difference in the encoding of the length of the messages between v0.34 and v0.37; a fix can be seen here when v0.35 was being adopted.
tower-abci is an asynchronous version, also currently working with v0.34 encoding, and of course no ABCI++ as it's not yet released.

Ideally we'd be using an async solution with ABCI++ and v0.37 encoding. Possibly the simplest approach would be to code our own tokio server and have the async equivalent of an Application trait.

However, since even the official server implementations mention tower-abci, perhaps a more future proof way is to fork it and do a few updates to make it work with the 0.37 specific proto types and encodings. Presumably it will be updated at some point and we can abandon our fork.

It does have some neat solutions in it, for example the handling of flush() requests, and the fact that it prioritises the handling of consensus requests over queries, or that it uses the domain types rather than raw proto DTOs. It is a bit more complex, though. It's worth checking out the mandatory kvstore example to see how it fits together.

FRC42 and fvm_actor_utils

This repo from the Helix team includes the implementation of a set of template and convenient utils to work with native FVM actors. We should consider them for the implementation of our own actors (maybe even exposing FRC_42 and the utils publicly through our own fvm-utils API.)

Implement check_tx

Similar to the handling of exec_tx in consensus-shipyard/fendermint#14 maintain a state for checking transactions. This would be something like just the StateTree, to access nonces and balances.

Check what validations Forest does in its mempool and what error codes it returns.

Add cicd build pipeline to enforce linting and tests

Currently this repo is having quite a few clippy and formatting complaints. We should fix them and add cicd to run linting and checks before merging.

Remove already executed cross-messages from gateway actor state

We currently don't have a way to signal that a cross-net message has been successfully executed and that can be removed from the queue of cross-net messages in the corresponding gateway. This will generate the state in the gateway to increase indefinitely (and unnecessarily), increasing the gas costs of new cross-net messages, as it will keep expanding the memory requirements of the gateway.

For top-down messages, once the message is executed in the child subnet, the parent signal should be signalled to notify that the corresponding message can be removed from top_down_msgs for the subnet: https://github.com/consensus-shipyard/ipc-actors/blob/ec4dbe39d21c0ab3b29fd3f5f5f8d5edc7f50d93/gateway/src/subnet.rs#L28. Parents never sync with their children so we need some participant in the child to propagate the signal up to the parent. One way could be to include in the checkpoint information about all the cross-messages executed and that can be cleaned from this data structure.
For bottom-up messages, the only piece of information persisted in the gateway of the parent is the cross-message meta: https://github.com/consensus-shipyard/ipc-actors/blob/main/gateway/src/state.rs#L47. Executed crossMsgMeta can be more easily cleaned, as when the messages included in it are executed, the corresponding crossMsgMeta could be conveniently cleaned.
send_cross propagated messages are already being conveniently removed from the postbox: https://github.com/consensus-shipyard/ipc-actors/blob/ec4dbe39d21c0ab3b29fd3f5f5f8d5edc7f50d93/gateway/src/lib.rs#L857

Implementation

Top-down messages: To garbage collect top-down messages, we can propagate in checkpoints the latest nonce of the top-down message execute in the child subnet. The parent will pick up this nonce and garbage collect all the message below and including that nonce from its state.
Bottom-up messages: For bottom up messages it is easier. When a bottom up message for a new nonce is executed, the message meta for that nonce can be directly cleaned from the parent's state.

P2P: Subnet content resolution protocol

Background

While child subnets C are required to sync with their parents for their operation, it is not required for parent subnets, P, to sync with all their children. This means that while C can directly pull the top-down messages that need to be proposed and executed in C by reading the state of the parent through the IPC agent and getting the raw messages, this is not possible for bottom-up messages. Bottom-up messages are propagated inside checkpoints as a cid that points to the aggregate of all the messages propagated from C to P. P does not have direct access to the state of C to get the raw messages behind that cid and conveniently propose them for execution. This is where the subnet content resolution protocol comes into play.

This protocol can be used by any participant of IPC to resolve content stored in the state of a specific subnet. The caller performs a request specifying the type of content being resolved and the cid of the content and any participant of that subnet will pull the content from its state and share it respond to the request. This protocol is run by all IPC agents participant in an IPC subnet. Initially, the only type supported for resolution will be CrossMsgs, but in the future additional content types and handlers can be registered in the protocol.

Design

This design is inspired by the way we implemented the protocol in the MVP, but if you can come up with a simpler or more efficient design, by all means feel free to propose it and implement it that way. As we don't have a registry of all the IPC Agents or peers participating in a subnet, we leverage GossipSub for the operation of the protocol. Each IPC agent is subscribed to an independent topic for each of the subnets syncing with. Thus, if an IPC agent is syncing with P and C, it will be automatically subscribed to /ipc/resolve/P and /ipc/resolve/C.

In the MVP, the protocol was designed as an asynchronous request-response protocol on top of a broadcast layer (i.e. GossipSub). We implemented three types of messages:

/// Protocol supported messages
enum Messages {
  Pull
  Push
  Response
}

/// Supported types of content
enum ContentType {
  CrossMsgs(Vec<Msg>)
}

/// Requests pulling some content from a subnet
struct Pull {
   source: Option<MultiAddr>,   // multiaddr of the peerID initializing the request
   source_sn: Option<SubnetID>, // source subnetID
   type: ContentType,  // type of content being requested
   cid: cid:Cid  // cid of the content
}

/// Response to a pull request.
struct Response<T: Serialize> {
   type: ContentType,  // type of content being requested
   content: T  // content resolved
}

/// Proactively pushes new content into a subnet to let 
/// nodes decide if they want to preemptively cache it.
/// (its structure is the same as for `Response` with but it 
/// is handled differently by agents.
struct Push<T: Serialize> {
   type: ContentType,  // type of content being requested
   content: T  // content resolved
}

When an agent wants to resolve some content from a subnet it broadcasts a Pull message to the relevant broadcast topic for the destination subnet sharing information about the content to be resolved, and optionally either information about the subnet or the multiaddress of the source agent making the request.
When one of the agents subscribed in the subnet and syncing with the state sees the request it either responds by broadcasting a Response message to the topic of the source subnet if it was specified in the request, or it directly connects to the MultiAddr of the initiator of the request and send the Response directly to them.
- Broadcasting the message allows for caching and de-duplications but also increases the load of the network.
Finally, when a checkpoint with cross-messages is propagated from C to P, agents in C may choose to broadcast a Push message to the topic of P for the case where agents from validators in P may want to preemptively cache the content so they can propose the messages without having to resolve the content in a destination subnet.

Alternatives

Gossipsub + Bitswap

One alternative to this protocol would be to directly use Bitswap to resolve any cid from our existing connections. We could use GossipSub exclusively for peer discovery, i.e. so all IPC agents would subscribe to an ipc/agents/<subnet_id> topics for each subnet to mesh with other IPC agents syncing with these subnets and establish connections that can then be leveraged by Bitswap to resolve content. For this to work, all the content that we want to be "resolvable" in the IPC agent needs to be cached in a local datastore.

Point-to-point + DHT or Gossipsub for peer discovery.

Another option is to leverage a DHT for each subnet, or to subscribe to specific topics for each of the subnets in order to discover with peers syncing with the same subnets, and then build a direct peer-to-peer protocol for the content resolution with the same kind of messages proposed above for the MVP implementation. Actually, a peer-to-peer libp2p protocol on top of some peer discovery protocol could be the most efficient in terms of number of messages and network load.

IPLD Resolver: Disable Kademlia content storage

As described in https://adlrocha.substack.com/p/adlrocha-beyond-bitswap-i , IPFS uses Kademlia to store a list of peers who can serve a piece of content, and not just a list of peers. It's only by accident that some blockchains decided to start using Kademlia for peer discovery, and now (at least for me) that's how I mostly think about it, but the kad protocol in libp2p is full implementation and thus supports more kinds of interaction than just peer lookup.

In particular, there are AddProvider and PutRecord which allow peers to register themselves as hosts and also to put data into our node. This is a potential attack vector, as malicious users could put pressure on our memory.

These events are handled in on_connection_handler_event and if we look at the PutRecord handler we can see that whether anything gets stored depends on the record_filtering setting. By using KademliaStoreInserts::FilterBoth we only get an event, while KademliaStoreInserts::Unfiltered puts it in the store first. Unfiltered happens to be the default setting.

To prevent anyone from storing records, we can either use the FilterBoth setting, or we can stop these events from reaching the Kademlia behaviour in our discovery::Behaviour::on_connection_event_handler.

IPLD Resolver: Service

Part of #475

Create an IPLD Resolver Service that:

Creates a Swarm with an IPLD Resolver Behaviour wrapping #34 #467 and #466
Has a run method to start listening on an address and poll the Swarm for events
Interprets certain events that need to go between the constituent behaviour protocols, e.g. when Membership raises an event about an agent serving a new subnet it didn't know about before, then the service would prompt Discovery to look up its peer ID to ensure it has the address if needed.
Has a command enum to accept tasks to do over a channel from the rest of the program. Command would arrive with response channels - these would be things like resolving up CIDs.
Has a command queue which it also polls (along with the Swarm) to look for internal requests.
Wraps the command queue into an async interface where the response channels are created and completed, for convenience. Instances of this interface can be shared out to the application and they can be cloned as well, and once the service is running, this is the only way to talk to it.
Configuration which aggregates the configs of all constituent behaviours.
When running the resolve command, it should ask Membership for the list of peer IDs serving data from a topic, ask Discovery for the list of connected/known addresses, then decide how many to pass to Resolve and in which order - connected first, known addresses last; but maybe just a few at a time to not span the network, e.g. if we know 100 agents serving data in a subnet, we can send bitswap requests to 10 of them, and if it fails, then another 10, etc. Note that the Bitswap library is clever enough to only send want-have first to all-but-one, and then want-block to one at a time, but we might want to keep even the want-have within a limit. At some point we might even ask the parent subnet members for the data - a fully Bitswap implementation would keep the wants and complete them later, but not libp2p-bitswap.

Document Tendermint Target

Identify which version/branch of Tendermint we are targeting.

Orchestration: Cross-net controller

Note: The detailed design hasn't yet been included into the design doc.

Background

The checkpoint controller orchestrates all the low-level interaction required for the propagation and commitment of cross-net messages.

It initiates new cross-net messages for a subnet.
It tracks when there are unverified cross-net messages for a subnet that need to be committed: either through a top-down message triggered by a state-change in the parent; or bottom-up through the commitment of a checkpoint.
It proposed unverified cross-net messages to the consensus engine of subnets for their commitment.
It handles the propagation of high-level arbitrary cross-net messages by decomposing them into basic cross-net messages primitives (in the simplified design for IPC that we are implementing, arbitrary cross-net messages between any subnet in the system are not allowed, and complex cross-net messages are decomposed in a set of cross-net messages between parents and child for the same address).

Goal

Implement a controller that the IPC client can use to handle all cross-net-specific functionality.

Tasks

Verify all the low-level functionalities required by the controller in the rest of the interfaces are available.
Implementation of the cross-net controller

Persist application state

Follow up consensus-shipyard/fendermint#14 with a persistence mechanism for AppState. Look at how Forest does it. Are they using a separate column family? Couldn't see support for it in the RocksDB facade.

Declare SCA storage variables and their corresponding structs and enums

Message interpreter

Implement the execution of chain messages using the FVM. Called an interpreter as a homage to https://github.com/ChainSafe/forest/tree/main/vm/interpreter

What's different for us from Forest is:

We get messages one by one from Tendermint Core, because that part of the ABCI interface hasn't changed. In later stages of ABCI++ rollout, the BeginBlock -> DeliverTx -> EndBlock -> Commit cycle will become FinalizeBlock (plus maybe a Commit).
The methods will be asynchronous, so they have a chance to do side effects when the message contains CIDs that need to be resolved.

So let's create an interpreter crate like Forest, with a abstract trait to apply messages on some state. The state will depend on the use case: during block execution it will be backed by copy-on-write cache of a block store, and similarly during transaction checking it will have a state that is based on the last executed block, with pending transaction applied but not committed. In Milestone 2 it will also have access to some shared memory through STM where it can coordinate the resolution of CIDs with the other parts of the application.

IPC agent command line version should be injected in cicd pipeline

Existing ipc agent command line is using

#[command(name = "ipc", about = "The IPC agent command line tool", version = "{VERSION}")]
struct IPCAgentCliCommands {
    #[command(subcommand)]
    command: Commands,
}

The {VERSION} is hard coded, ideally it should be injected by the cicd to have the same value as the github commit.

Enforce gas_fee_cap during checking

Followup to consensus-shipyard/fendermint#28

The FVM allows the user to send messages with a fee cap which is lower than the base fee, even zero; the miner pays the difference. A validator will probably not want to foot all the bills, though, so the FvmMessageInterpreter should be able to reject messages which are not profitable.

Look at what Forest is doing in the mempool and implement something similar.

IPLD Resolver: Manage outgoing connections

#466 assume that it is possible to open new outgoing connections to peers in subnets we want to resolve content from. There is a limit on incoming connections in #465, so we might have to try a few peers until we find one with free capacity, but we assume we won't run into our own restriction, that if we need to connect to a new subnet and we don't have a connection to it, we'll make room.

For this reason, #465 sets no limit on outgoing connections. We have to make sure though, that the connections opened during peer discovery (unless it's using UDP) and Bitswap don't stay open forever.

One way of achieving the latter is to track in the Content behaviour who we are connected to and close down connections in subnets which are served by too many peers already.

Refactoring JsonRpcClient::subscribe

Current implementation of subscribe is of the form:
```
async fn subscribe(&self, method: &str) -> Result<Receiver<Value>>;
```
Which is returning a serde_json::Value, this value also contains the id and jsonrpc fields, also it's not handling the success and error. We should consider a more typed implementation:
```
async fn subscribe<T>(&self, method: &str) -> Result<Receiver<T>>;
```
And internally uses JsonRpcResponse to handle both success and error.
We could potentially consider also adding params to the function call as there could be cases where param is required.
Each subscription is actually opening a new ws connection. We should consider using connection pooling to manage active connections.
Lastly we could provide a unsubscribe function for convenient.

IPLD Resolver: Integration Test

Part of #475

Start an in-memory cluster of #465 and run tests on them to show that they discover each other, learn the membership table, and are able to resolve CIDs. Run scenarios for late joiners to see how Gossipsub behaves.

IPC: Maintain circ_supply and base_fee

As a follow up to consensus-shipyard/fendermint#14 figure out how to update the circ_supply. Are there tokens issued during FVM execution? There are burn fields, so the amount might decrease.

FVM and built-in actors dependency

Add a dependency to ref-fvm and the builtin-actors. We probably won't need all builtin actors, but most things depend on things like the InitActor, so I don't think we can not load an actor bundle.

Define SCA and SA interfaces

Store state hash per block height

A followup for #347

Currently the application only remembers the last committed state, however, if we stored the state hash at each past block height, or the last N heights, we could go back in state (it's not emptied from the IPLD store) and run multiple consistent queries on the same height.

Think about how to implement this in the view of a new block being produced every second:

Should we remember the last N heights, and purge the ones before?
Should we insert all and hope RocksDB will just store them?
Is there some data structure we could use to only store changes, but still have fast lookups? Should we do bisection to find which range a block falls into?

Tendermint will also store all all hashes as part of the headers, but it also has an option of how many blocks to retain. It would be reasonable to say: we support queries up to the last hour, or last 24 hours.

Subnet manager

Background

All commands for subnet management pertain to a specific subnet and are executed on behalf of an account. The subnet and the account need to be specified in the configuration, otherwise, the command is rejected.

In the CLI, the commands are generally of the form:

./ipc-agent <subnet|checkpoint|crossmsg> <command> [params] --as=<account>

Subnet Management Commands

The subnet management commands that need to be implemented are:

subnet create: Creates a SA in the parent subnet P for subnet C on behalf of A.
subnet join: Account A joins a subnet C as a validator. It sends a message to the parent P of C calling the Join method of the SA deployed on P.
subnet leave: Account A leaves a subnet C as a validator. It sends a message to the parent P of C calling the Leave method of the SA deployed on P.
subnet kill: Kills a subnet C on behalf of A. It calls the Kill method on the SA of C deployed on P. This does not destroy the SA.

More details about the specific implementation of this commands can be found in the IPC Agent design doc.

IPLD Resolver: Content

Part of #475

IPC agents running on the parent subnet need to pull data from IPC agents (also) running on the child subnet(s). To do so, we have multiple options:

A custom point to point protocol as described in #475, where a parent agent asks a child agent for the CID it sees in a checkpoint already present in the parent ledger. The child agent makes a JSON-RPC call to the IPC gateway actor in the child Lotus instance ledger, asking to gather all Messages indirectly pointed to by the CID (it directly points at a list of CIDs in the ledger), and return the fully fledged cross message bundle.
A generic Bitswap protocol instance either from libp2p-bitswap or iroh-bitswap (which I haven't yet looked at; it's potentially more feature rich but bigger), in which case the parent agent asks the protocol instance to recursively resolve a CID and insert the whole sub-graph into the Blockstore.

In this issue, I would like to start with option 2 as it's more general purpose and seems to fit the situation:

We have a recursive data structure (the checkpoint), aggregating a list of messages
We can rely on enveloping to avoid transmitting messages multiple times if the node already has them
It's more granular, so it should scale to larger checkpoints with hundreds of messages in it (although passing it between the agent and Lotus is out of scope here)
Currently Forest already uses Bitswap to resolve messages in Gossiped blocks, although it's a bit of a misuse because it waits for the results, and during historical syncing it uses a different protocol
Unlike Forest, we can afford to wait until the query completes
Doesn't require support from the Gateway actor

To support Bitswap, the child agent will need a way to pull arbitrary CIDs from the blockchain node it is connected to. If this is not desired, if we want to limit it to just checkpoint content, then we can potentially keep using the Gateway actor to only resolve CIDs if they are checkpoint related.

Ideally the parent agent can also check if it has a CID already, to avoid asking if it does, which is part of the recursion, by implementing BitswapStore::missing_blocks. Naively we could let the agent use a memory based store that always has nothing in it and pull all CIDs, but the parent might already have the messages - for example if they ran nodes across multiple subnets and shared the storage, or if we used Gossipsub to pre-load the CIDs onto the parent agents.

In this issue, create a Content behaviour wrapping a Bitswap behaviour that:

Has a resolve method to fetch a CID from a list of peer IDs. The RequestResponse protocol underlying the Bitswap will try to connect to any peer IDs it isn't connected to at the moment, for which the Swarm will ask #34 for addresses.
From a higher level component, call the resolve method with a list of peer IDs based on what #467 knows are agents serving data in a subnet.
Poll the Bitswap behaviour and when it signals that a query is completed, raise an event to show that a CID is ready in the block store.

The implementation of the BitswapStore is out of scope here, that will be the integration point with the IPC agent, or Fendermint.

Runtime crate add more description in README.md

Add more description/spec to runtime crate. The current description can be expanded much more.

Add slither Github action on push

Querying through ABCI

To interrogate the state we must implement the query method. To do so, create a new ADT to represent all possible queries, and create a new interpreter to handle it. It may be a good idea to split the interpreter modules into further pieces.

I'm not exactly sure how it should work with FEVM in the future. During integration testing, to query EVM contracts I had to send transactions, even if they were views. But there are plenty of other Web3 JSON-RPC methods that might be supported by queries. I'll look at Ethermint when the time comes.

For now, it would be enough to return a balance and the state root of the actor: the latter is useful because we can use general CBOR/IPLD resolution to actually get the state independently.

The ABCI query message allows the client to supply the block height at which they want the results to be evaluated. To support that, we'd need to save the last N state root hashes. Depends on what we want to do:

if we have to run two queries to serve the balance and the state separately, they can be out-of-sync if a block is updated
if we return the balance and the state in a single query (they are together as ActorState) then they will be consistent
but if we want to query two different accounts' balances/states, they can again be out-of-sync
if we allow the queries to be a list, rather than one by one, they can all run on the same root hash
or we allow specifying the block height, and return an error if it's no longer available; then we can query one by one

IPLD Resolver: Membership

Part of #475

We want to know which IPC agents we can contact to resolve content from a specific subnet. To do so, we can use Gossipsub.

First, I'm assuming we have a single P2P network of agents, with everyone potentially able to connect to anyone else, with every agent running a single Swarm on a single address, serving the needs of all subnets. Later we can consider alternatives.

There are at least two ways to use Gossipsub to achieve what we want:

Agents who are able to serve data from a subnet can publish a message into a topic, e.g. /ipc/subnet-agents. The message will spread across the network, and every node can build up a view of which agents do which subnets (a bipartite graph, as agents can participate in multiple subnets).
We can potentially forego even publishing if we use the topics themselves to signal membership. Agents who are able to serve data from a subnet can subscribe to a topic, e.g. /ipc/subnet/<subnet-id>. The subscription information spreads around the network and at least the Rust version of Gossipsub can be queried to list peers subscribed to any topic. The parent subnet agents would not subscribe to the child subnets, they would just query their Gossipsub behaviour who are the subscribers of a child subnet, and contact them when they need to. The drawback here is that we only learn the subscriptions of peers we are connected to, wheras with the first approach we learn about everyone.

Through Gossipsub, we would learn about PeerIds that are serving subnets, but not their address. To connect to them, we rely on #34 to learn their address.

In this PR, we develop a Membership behaviour, such that:

It wraps a Gossipsub instance
It has a join and a leave method that the agent can call to subscribe or publish to the necessary topics to signal membership to others
It listens to events from Gossipsub and either maintains a list of peer-subnet mappings, or is able to ask Gossipsub for the information (if it has to maintain it, we need to think about limits on the size of the collection).
(Optional) It raises events when it learns about new members in a topic, which could be used to instruct the Discovery behaviour to ensure we proactively learn their addresses, unless they are already known, by doing Kademlia lookups.
It provides a members method to return the list of peer IDs of data providers in a subnet.
If we publish, then periodically re-publish the current subnet membership information about the agent itself, to make sure new joiners are informed. (We could publish a list, which would mean we don't even need join/leave, just a serve message with all the subnets, and a timestamp).
If we publish, then track the last time we heard about the membership of an agent, and stop suggesting connecting to it if it's been too long - this is like a heartbeat so we don't end up trying to connect to agents long gone.

Questions:

How does a new joiner learn about the current memberships? Does Gossipsub receive subscriptions from peers it connects to? Should we periodically publish our memberships to keep them fresh in everyone's tables?
How does Gossipsub decide who to connect to in the first place? There is an option for explicit peers (we could use the same list as we do for bootstrapping Kademlia, but it looks like it also tries to connect to anyone whenever a connection is established by the Swarm, so the natural workings of Kademlia queries will prompt Gossipsub to connect as well.

Overall, the explicit publishing of messages is probably better, because:

The Go implementation would not serve subscription lists, it would only work for Rust.
It allows us to learn about peers we aren't connected to at the moment, but we might want to if we need their data, as opposed to only knowing about, say, the 50 agents we are actually connected to.

Asynchronous versions for Subnet Manager methods

As described here, the current implementation of the LotusSubnetManager assumes synchronous methods, where after sending the message to Lotus, the manager waits for the message to get through and be validated to return the result.

Certain use cases may prefer a fire-and-forget async implementation of the subnet manager, where the message is sent, and then the user is responsible for querying the result of the message when needed. In this case, the manager methods would return the Cid of the message for the user to query the result at a later time.

Endpoints to read from parent state

Related: consensus-shipyard/lotus#21

Background

An eudico node running in a child subnet will need to read state and listen to events from its parent's subnet for its operation. In order to decouple completely the child subnet from the specific implementation of its parent blockchain, the child subnet node relies on the IPC agent to pull the required information from its parent state. The IPC agent will have to provide a set of endpoints in its jsonrpc server to serve these requests to child subnet nodes.

Some of the operations that require reading state from the parent blockchain are:

The consensus algorithm, that needs to track the validator set from its parent to determine when a reconfiguration needs to be triggered due to a change of membership.
The validation of blocks including top-down cross-net messages. Whenever a block including cross-net message is proposed, one of the consensus checks performed by validators to determine if the blocks is valid is to check if the cross-net messages included in the block have been successfully accepted and finalized in the parent chain.

Implementation

Initially, we can implement this jsonrpc methods so that is Eudico the one that polls the IPC agent to pull information from the parent. Alternatively, we could consider adding also some endpoint in Eudico to support pushing information from the IPC agent to Eudico. This would require all subnet SMR systems to support this method in order to be able to run IPC.

Implement cron

Followup consensus-shipyard/fendermint#14 with the implementation of cron step in begin.

Orchestration: Checkpointing process

Background

Every check_period epochs, the state of a child subnet C must be checkpointed in its parent subnet P. A checkpoint for some epoch e includes metadata (e.g., subnet ID, epoch, etc.) and a proof of the state of C at epoch e. The form of this proof is an implementation detail of the respective subnet.

The validators of the subnet C are responsible for generating the proof. Its exact form and the number of validators involved in its generation is an implementation detail of the subnet. As of the time of this writing, in our reference implementation, this proof consists of the signatures of more than two-thirds of the validators of C.

The IPC Agent actively acts on behalf of one or more validators to orchestrate the general checkpointing activity. This includes constructing the checkpoint data, collecting proofs from the validators, and submitting the checkpoint to the subnet actor of C deployed on the parent subnet P. At a high level, for every account A on subnet C associated with the agent, the agent conducts the following steps:

Monitor subnet C for checkpoint epochs.
For every checkpoint epoch e:
Check if A is a validator at epoch e.
If yes, then build the checkpoint metadata.
Request a proof from A for the checkpoint.
Submit the checkpoint, consisting of the metadata and the proof, to the subnet actor of C on P.

Subnet manager command

checkpoint list: Lists all the checkpoints committed for a range of blocks for a subnet C.

Include path in our runtime abstraction

Propagate this small patch into our own runtime: filecoin-project/builtin-actors#951

Simplify the prepare/commit phases to just try_commit

A followup on #287

According to the RocksDB documentation, for OptimisticTransactionDB:

commit: will commit changes unless there is a conflict, in which case it will return Busy
prepare: writes to the WAL, so commit is simple, as the cost of making rollback more costly; it doesn't say that it would return Busy if there are conflicts

See:

If commit is the one which returns Busy then this is at odds with how STM expects it to work. Also, in STM we do the database prepare first, and the in-memory key checks second. We could do the other way around, though:

locks the keys in the STM transaction first, detect conflicts there
if a conflict is found, rollback the DB
if there are no conflicts, then try_commit the database
if try_commit returned true, write to the locked in-memory places
if try_commit returned false, release the locks and try again

Sharing Json Rpc request and response between client and server

Since both the json rpc client and server will use Request and Response struct, we should refactor existing code base to make these struct shared.

The current json rpc client's response does not have json rpc error format constraint which were specified in the spec. We can try the following proposal:

use serde_json::{Value, RawValue};

struct Error {
  id: i32,
  message: String,
  data: Option<RawValue>,
}

struct Response {
  id: Value, // actually only Number, String and Null
  jsonrpc: String,
  result: Option<RawValue>,
  error: Option<Error>
}

impl <T: Deserialize> From<Response> for Result<T, Error> {
  ...
}

struct Request<T> {
  id: Value, // actually only Number, String and Null
  jsonrpc: String,
  method: String,
  params: Option<T>
}

Add more test cases for lotus json rpc client

Add more test cases to lotus json rpc client to have more test coverage.

Note that we might need to have a public test node endpoint for easier testing, since some methods to be tested are mpool_push_message and state_wait_msg, new_wallet. These methods are harder to test.

The other way to test is mock the json rpc client and their responses so that we don't rely on external apis.

Update to FVM SDK 3.0.0-alpha.22, shared 3.0.0-alpha.17, ipld_encoding 0.3.3

Some of the key projects over FVM are upgrading its dependencies, we should maybe consider upgrading ours here and in our actors eventually.
filecoin-project/builtin-actors#1095
filecoin-project/actors-utils#187

Persistent IPLD store

The ref-fvm repo has an fvm_ipld_blockstore abstraction which is implemented in Lotus and used via FFI; on its own there's no persistent implementation of an IPLD blockstore in ref-fvm, only in-memory ones, for testing. The Application will need a persistent solution to store the state.

Forest offers multiple block store implementations, notably on top of RocksDB and ParityDB. Out of these two, RocksDB seems to be more like what we want. A quick read of ParityDB suggests it's optimised for small values, that go into a Merkle trie; I think an IPLD block can be larger and more varied.

So, we're in luck, we should be able to just reference the forest_db crate.

Cross-msgs: Finalize design doc for cross-net messages

We are still missing the fine-grained design of cross-net messages. We decided to postpone this until we had at least the implementation of #480 and #481 so we have a better sense of the architecture and the interactions between the IPC Agent and eudico.

Self-contained transaction model

Self-contained transactions are what we call Message in Filecoin. They can be passed to the FVM for execution more or less as they are. The non-self-contained transactions will come in Milestone 2 where it will be possible to put transactions into that contain CIDs which need to be resolved first, which is where we'll make use of ABCI++. For Milestone 1, we stick to the vanilla use case.

Look at what a Message looks like in Forest and bring it over to this project, with all the related data model such as addresses and signatures.

Map events to ABCI

Follow up consensus-shipyard/fendermint#14 with the mapping of StampedEvent to ABCI events.

This requires either:

filecoin-project/ref-fvm#1635 to be merged and released (unrelated PR but makes the fields public)
a duplicate data structure on our end and reading the CID as an AMT

Documentation and getting started guides

For the release of M1 we should have all the documentation and getting started guides for Spacenet as well as all the repository and components of IPC for developers that want to dig deep into our tech.

Add READMEs and tutorials for the IPC Agent.
Update Spacenet documentation
Update actor READMEs and tutorial.

(This is used as the top-level issue to track the documentation task, additional issues will be correspondingly opened in the relevant repos)

Message format tests

The messages need to be serialized to a binary format, which in our case is IPLD.

We need tests that demonstrate that:

arbitrary messages can be serialized and deserialized with complete fidelity, that the format isn't lossy
save examples as test vectors or golden files that we can point to as reference and protect against regressions

At this point we only handle the signed messages, however, to make convincing tests for chain messages, it would be good to include the other two planned examples (just reject them in handlers), because some of the serde annotations tell it to use untagged types, and I'm not sure that will work if there are more than one possible types.

Add new convenient functions to Runtime

Propagate to the runtime the changes from filecoin-project/builtin-actors#1150

IPC Agent Daemon and CLI

The agent can be launched as a daemon that autonomously orchestrates the IPC activities of the accounts specified in the configuration. As part of the configuration the user should determine the subnets where it will behave as a validator so the right orchestrator processes for those subnets are spawned.

$ ./ipc-agent daemon [--config=path/to/config/file]

This mode also runs a JSON-RPC server for ad-hoc execution of user requests carrying IPC commands. The CLI for the agent interacts with this server to interact with IPC.

IPLD Resolver: Pre-publish messages

Part of #475

As a potential speedup, the child agents can preempt later Bitswap requests by publishing the CIDs they anticipate will be requested from them to the parent subnet via Gossipsub, so that the parent agents would already have them when they would normally ask them via #466 .

For we would need to extend #467 to not only track subnet memberships, but also to subscribe to pre-publishes from child subnets. For example parent subnet agents could subscribe to /ipc/subnet/prime/<subnet-id> if they are interested in getting these notifications, and any child subnet could send messages there. The agents would cache these messages up to a certain amount of time, with limits on how much they can keep in memory to avoid being DoS attacked. Or, instead of the parent subscribing to the child, it could subscribe to the parent topic itself, and all child subnets send to that single topic, so parents don't have to actively subscribe to children.

Add a method to #465 if we adopt this extension.

Declare SA storage variables and their corresponding structs and enums

Add forge CI pipelines

GW1: Implement registering SA in SCA

Persist subnet genesis in IPFS or use IPLD resolver

Related: consensus-shipyard/lotus#84

Background

Initially we were planning to store the genesis.car for the subnet actor to make it available for new peers joining the network, but this CAR is ~12MiB on average, which may be too big to store it on-chain. As a workaround we can store the genesis template for the subnet and all peers can deterministically generate the genesis.car from the template (1-2 MiB). While we are planning to go for M2 with this approach, it is still not the best.

Proposal

The proposal to fix this issue is to locally persist the genesis in IPFS and every peer that is part of the subnet. The subnet actor will only persist the cid of the genesis on-chain, and peers joining the subnet will be able to retrieve it either through the IPC Agent´s IPLD Resolver (a.k.a Content Resolution Protocol) or through IPC. The IPC agent should implement hooks and commands to perform this retrieval.

consensus-shipyard / ipc Goto Github PK

ipc's People

Contributors

Stargazers

Watchers

Forkers

ipc's Issues

Implementation

Background

Design

Alternatives

Gossipsub + Bitswap

Point-to-point + DHT or Gossipsub for peer discovery.

Background

Goal

Tasks

Background

Subnet Management Commands

Background

Implementation

Background

Subnet manager command

Background

Proposal

Recommend Projects

Recommend Topics

Recommend Org