linera-io / linera-protocol Goto Github PK

View Code? Open in Web Editor NEW

115.0 115.0 100.0 138.27 MB

Main repository for the Linera protocol

License: Apache License 2.0

Rust 98.13% Shell 0.36% Dockerfile 0.08% HTML 0.01% TypeScript 0.46% Vue 0.84% Nix 0.07% Solidity 0.05%

blockchain rust wasm

linera-protocol's People

Contributors

Stargazers

Watchers

Forkers

ma2bd afck christos-h mathieudutsik jvff quantros aidar100 dmi3yk never9682 firedpeanut pokerface87 maxtori andresilva91 ronnyjohan kx-saber sahana3131 aankirz trueutkarsh konstantin-91 ashu26jha therustmonk ashrafolarinoye ethever lazyfuhrer romainua twey martinkong1990 unicornn1 latareli papadritta sqltrigger fakefraud dfy313 allanperlee qyeah98 thuongdh2 metafivn dewrin azharsarbudeen alexgoodchild eriksyuan jorgeantonio21 solanca kikakkz 13billyblack colerar greengeko daviddprtma adam-makishima aeon-life jamessnr truthixify linhtt389 harry-dotcom amaeth liolikus josepi0x calvinjolly andreassav avaworld16 geamxd ritika12df stephenteay fragilellama papichuloooo99 micho001 premshah2002 mstr007 heysaeed olanrewajuafeez 0xblockwiz mceal smith961 praiseo2 valhalla0x1 pohian0098 jabbarsink softcreations01 ruri95 abayup hastodev xen0glitch splinterx21 mrnobody700 naji1368 duguorong009 mushhub amirzarei98 atulminz rithvik-padma albend888-p kng0111 nirajsah kingrayshawn ungaro thewind0811 kakaname edwardluu johnchandlerburnham

linera-protocol's Issues

Implement `was_reset_to_default` for CollectionView.

This should allow removing the stored_indices.

Make dynamoDb handle a key_prefix of length 0

Right no DynamoDb does not allow a key_prefix of length 0, which is I think a pity since other systems (memory / rocksdb) allow for a key_prefix of length 0.
Right now the chosen solution is to work with a prefix vec![0] which is I think a hack that should be eliminated.

Avoid locking when handling requests

The current implementation of the server uses a Mutex to guard the MessageHandler instance used as a state. This prevents concurrent handling of requests, and ends up degrading performance.

Merge `ChainGuards` and `ChainView`s

Both types provide a form of mutual exclusion for chain states, and should probably become a single type.

Computation of indices to make faster + allow order dependent `for_each_index`.

The code for computing indices iterates over the entries in the database then look at the updates. Then the indices are sorted for operation. This is suboptimal since the keys come from the database as sorted and the entries from the updates are also sorted so this could be done in O(N) time.

The second point is about for_each_index. The whole point of this function is that we do not need to build the index set but instead we can iterate over all the entries and the function does its business on it. Therefore, we are forced to require functions f which are order independent. Building the O(N) algorithm above would allow to resolve this problem.

Implement `batch` for memory and `dynamo_db`

This would avoid many async operations which would be a big gain. It could (possibly) affect negatively the performance of the memory part but allow room for the following two improvements:

Implement a low level batching for dynamo_db.
Refactoring of the existing code in order to have more code sharing between memory, rockdb and dynamo_db.

Improve the APIs of linera-views

We discussed a few low hanging fruits in linera-views:

generalize the logics so that (write_)commit and (write_)delete take &mut self (in other words views are still usable after committing changes to storage)
not sure if delete should be renamed commit_deletion or split into two parts: staging a delete and committing it.
similarly, we don't support calling remove_entry then load_entry on the same index in the same CollectionView instance. To solve this, we need a function reset_to_default that stages the fact that we are resetting the object.
As result, maybe AppendOnlyLogView should be renamed LogView or VectorView.

[storage] DynamoDB: handle paginated (truncated) query responses

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.Pagination.html

Probably we'll end up with pagination in our own abstraction:

trait KeyValueOperations {
    async fn find_keys_with_prefix(&self, key_prefix: &[u8], start_after_key: Option<&[u8]>, at_most: Option<usize>) -> ... ;
}

Add S3 as a storage option

Motivation

Currently Linera does not support any storage that is shared between workers. Allowing a storage layer to be shared between workers will simplify shard reassignment. Otherwise, workers will have to coordinate to transfer shard information between them.

The first shared storage layer we aim to support is Amazon S3, because it's relatively simple and provides the reliability and atomicity that we require.

Replace usage of `failure` with `anyhow` or `thiserror`

Currently the failure crate is used for error handling, but anyhow and thiserror are more modern crates that are more specific to be used as error frameworks or Error trait implementations (respectively).

zef-service should use anyhow, while zef-core, zef-base and zef-storage should use thiserror.

Improve creation of `S3Storage` instances in tests

Motivation

Current S3Storage unit tests constantly have to ignore the BucketStatus returned from the S3Storage constructors. That increases the boiler-plate code in tests that lead to no new useful information.

Ideas

Refactor the constructors, so that some of them return a BucketStatus and some don't. The naming pattern to use still needs to be figured out.
Update LocalStackTestContext to have a create_s3_storage method that ignores the BucketStatus.

Use generative tests for testing `S3Storage`

Motivation

Using generative tests should help with improving readability of the tests (because unimportant dummy values don't have to be manually created) and with the coverage of the tests (because it uses more inputs that could lead to new situations).

Linera doesn't use any generative tests yet, but when I implemented the initial S3Storage tests I wrote them to use it. However, that required a lot of work, and the PR (#65) became larger than expected, making review more difficult. Therefore that PR was stripped from the generative tests, and a separate PR should be open for them.

This also makes it easier to discuss the necessary changes in the code that are needed in order to support generative tests.

refactor: ArithmeticErrors

Currently we have linera_base::error:Error which only contains overflow/underflow errors. This should be pulled under ArithmeticError which has variants for different types - i.e. BlockHeight, Round, etc.

refactor: split ExecutionError into 'SystemExecutionError' and 'UserExecutionError'

Execution errors come in a variety of flavours - after #175 there's going to be one enum ExecutionError which encapsulates both.

We would like to logically separate these into SystemExecutionError and UserExecutionError - perhaps with a third for leftovers called ExecutionError.

Make rocksdb "views" crash-resistant

RocksDB’s “transactionDB” API is surprising similar to our “views”: https://docs.rs/rocksdb/latest/rocksdb/struct.TransactionDB.html

So it should not be too hard to be crash-resistant in the case of Rocksdb.

rename linera-storage2 into linera-storage

Make `flush` async free

The only obstacle to it is for the MapView and CollectionView require to be async because of the access to the indices.
The proposal is "Now I'm thinking this could be nicely optimized away with a new command WriteOperation::DeleteRange { prefix: Vec }" which would make it possible.

CI fails because of usage deprecated macro

Motivation

CI is currently failing because one of our dependencies has deprecated a macro.

Solution proposals

Replace or work around the macro usage
Try to force the library version used in CI

Fix handling of retryable errors by clients

The client-side logic that I added early on to deal with server-side errors is broken ~~because there are legitimate scenarios where honest validators don't agree on a single error, yet the error should be processed and the query retried later.~~ (EDIT: because in case of missing messages, the code retried only once and without a delay!)

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Failed to communicate with a quorum of validators:
[
    MissingCrossChainUpdate {
        application_id: ApplicationId(
            0,
        ),
        origin: Origin {
            chain_id: f783f3b45d003c39,
            medium: Direct,
        },
        height: BlockHeight(
            1,
        ),
    },
    MissingCrossChainUpdate {
        application_id: ApplicationId(
            0,
        ),
        origin: Origin {
            chain_id: 3a5228a1b887e471,
            medium: Direct,
        },
        height: BlockHeight(
            1,
        ),
    },
]', linera-service/src/client.rs:704:74

Unify `commit` and `flush` codes.

This requires cleaning up of the set(&mut self, batch: &mut Self::Batch, value: T) of RegisterView so that it takes a value that is a &T.
That way, we would store only references, so no potentially big clone operations would occur when doing a flush of a RegisterView.
Right now, the set of dynamo_db and rocksdb take a value and immediately take the reference. But that is not the case for memory. It would be possible after the restructurations of the batching code of memory. Still, difficult lifetime issues would have to be addressed.

Improve the RPC protocol

The current RPC protocol used in Zefchain is entirely homemade, which while was good for learning and might be good for an audit, has a few limitations:

might not be sustainable to maintain it while adding complexity
unbounded message sizes
UDP doesn’t work with large blocks (or lossy channels)
it leads to sub-optimal locking, because a single write request can lock the whole worker's storage and block other requests
no support for user notifications
no encryption
not currently possible to reuse connections for multiple requests

The RPC protocol should be improved to address these limitations, possibly using existing crates like tokio-util and tower.

Split `linera_base::error::Error` into several types

We currently use error linera_base::error::Error for different things:

an error type for most fallible functions in the codebase Result<T, Error> disregarding the crate,
a data type that goes on the wire in case of server errors.

Recently (thanks @jvff), we started defining smaller error types locally in each crate. In the same spirit:

We should split linera_base::error::Error so that each crate has its own error defined locally.
There should be execution/chain errors.
We should define an error type linera_core::node::NodeError meant to be serialized and sent over the network.
There should be linera_core::client::ClientError (see also #66 ).

[easy] create separate CLI tool to create genesis.json

Right now, we use a subcommand of client but the truth is that creating genesis.json is neither client-specific nor server-specific.

Restructure tests of the linera-views

There are several problems:

Some test uses only the memory interface while all tests should use all 3 interfaces.
Some bugs were not detected by the tests which is bad.
Some tests are badly located in the tree structure.

[storage] support schema-less data hashing and deletion

Currently, hashing and deleting the content of a view depends on the Rust-defined type schema. This is notably true for CollectionViews for which we cannot even define a low-level delete operation because we don't know the internal structure of the "value" views.

The goal of this task is to define a low-level storage layout so that the set of keys present in a database can be accurately iterated without knowledge of the Rust types.

Accelerate the hash computation

The hash of structures like CollectionView is computed recursively. Could this be accelerated?
One possibility would be to put some type like Option<u256>. The problem is that if a hash is invalidated by a commit, we would need to invalidate recursively the others that depend on it.
It has to be seen if the speedup is really needed. So, first some benchmarks would be needed.

Improve client <- server synchronization of received certificates

Following #53 validators cannot accept cross-chain requests from future epochs any more: in theory, those messages could be safely accepted in the inbox but because of channels, this means that the blocks in received_log would not be sorted by increasing epochs any more, thus contradicting assumptions made by find_received_certificates in client.rs.

clean up TEMP_DIRS in client tests

Create a

struct TempRocksdbStoreClient {
    client: RocksdbStoreClient,
    _dir: TempDir,
}

and implement either Store or Deref for it.

move the chainguard code of linera-service to linera-storage2 in order to replace maps by dashmaps and solve the memory leak with locks

Prevent misuse of view APIs

Right now there are many different ways to misuse the View objects:

~~cloning and committing several times,~~ (FIXED)
batch/committing interior views.

For the last point, I believe we should just rename the current trait and tweak the new derive macros so that not every user-defined View has save.

Concretely, we could just do the following:

rename ContainerView -> RootView
remove HashableContainerView
provide #[derive(View)], #[derive(RootView)], #[derive(HashableView)]

move execution.rs and system.rs to a new crate linera-execution (@ma2bd)

investigate CI performance

rust-cache
ccache

Allow configuring the S3 bucket name

The bucket name used in S3Storage is currently hard-coded, but it must be configurable so that different validators use different buckets.

Without this, in the integration test, the validators all use the same bucket and that may lead to an invalid state.

[zef] Add LRU cache to file storage

Probably just using the LRU crate

Make it possible to pattern-match view errors

Currently, view errors have a different (generic) type for each storage implementation so we can only forward the errors.

Use PartitionKey and SortKey for DynamoDb.

Currently, the DynamoDb code is first downloading the keys with prefixes and then deleting them one by one.
It appears that there is no way to do a deletion by range in DynamoDb. What we have are SortKeys and other related notions. In other words, we have to hardcode the prefix directly in the database in order to delete them.

This actually reflects the way we are using delete_key_prefix. Our one and only use case is batch.delete_key_prefix(context.base_key());. So, we should build data on dynamoDb in an orderly way in order to delete it in an orderly way.

Integration test with LocalStack fails

There seems to be a couple of issues:

The storage isn't initialized
The storage is cleared every time a server shard starts

Make sure the code handle failures due to `bcs::to_bytes` gracefully

Sadly, BCS serialization can fail (e.g. for data structures that are too nested) so we should not just panic when that happens.

Clarify the routing of messages

When execution generate effects, the targeted remote inbox(es) are currently relatively implicit. I quickly fixed a safety issue at the end of #69 but as a followup we should probably change the data structure so that message routing is very explicit.

One idea is to change Vec<Effect> (inside Value) into Vec<Message> where struct Message { medium: Medium, effect: Effect } and enum Medium { Direct, Channel(String) }.

[zef] tolerate crashed workers

Now that workers have persistent storage, we'd like to make sure that killing a worker process never results in lost computations (e.g. missing money on the receiver's side).

In the first iteration of this task, we'll make the following simplifications:

Interrupted computations do not need to resume when the worker restarts (instead this can be done lazily when accounts are read again).
We tolerate unbounded hot storage.

For 1. another option would be to aggressively resume computations when a worker restarts. However, this requires an accurate list of the all accounts (potentially stopped) in this shard.

For 2. eventually we should make sure that workers' memory is bounded. (The corresponding persistent data in storage is called "hot storage". The rest, "cold storage", is only needed for auditing and data validation after synchronization.)

[zef] provide better storage options

The current storage to disk was fun to hack but using the filesystem is quite limiting (e.g. total number of keys).

For testing, RocksDB would be the easiest to add. Then we could remove the filesystem implementation altogether.
Later, we may also want to support a proper database engine such as Postgres.
For production, we need to support at least S3 (probably using localstack to test locally and in CI)

Finish migration to linera-storage2

After #98 there will likely be a number of things left:

recover the deleted code relative to S3 and port it to support DynamoDB in linera-service (@jvff)
#119
delete chain.rs in linera-base and linera-storage
#120
#121
fix other temporarily deactivated tests (may require #101 however) (@jvff)

Do not use `anyhow` in `linera_core`

We need to define linera_core::client::Error instead of using anyhow in a library.

sub-modules cannot be tested individually

Description

On main, cargo test runs just fine running the entire test suite. However if you run tests for just one sub-module, the tests don't compile.

Steps to reproduce

git checkout main
cargo test -p linera-storage

S3 storage is not initialized in `make_storage`

When StorageConfig::make_storage is called, it needs to populate the storage with initial data if the storage doesn't yet exist. However, if it creates an S3Storage, there's currently no way for it to determine if the bucket was new and empty or if it was already initialized.

This should be fixed so that the S3 storage is properly initialized if needed.

Optimize storage updates for chain states

Currently the storage layer serializes the whole chain state every time it is updated. To make things worse, this includes non-constant-size data such as logs and queues.

We should split the chain state across multiple keys and use multi-key operations to maintain atomicity (important for crash-tolerance).

Remove scripts used for Docker images

The scripts used in the Docker images are a little brittle. They're an extra thing to keep track of, and failing to update them can lead to CI failures that are had to diagnose, like this one.

The scripts should be removed, but the binaries should be updated to handle the things that the scripts handle, like obtaining the validator configuration.

Add support for AWS DynamoDB as a storage layer

Create a DynamoDbStorage type that implements Storage, in a similar way to S3Storage.

Move server states into a shared map

Currently, guards are used to protect concurrent access to chains. This is done using a ChainGuards type in linera-service. However, this is brittle, because it's still possible to forget to obtain a guard before changing a chain's state.

An API should be written that ensures there's no way to access or change a chain without obtaining a lock.

The refactor could also allow the Server type to not implement Clone.

Investigate removal of lock guards inside `Context` implementations

Motivation

Current linera_views::views::Context implementations require an OwnedMutexGuard when they are constructed. This sometimes leads to situations where dummy locks are created.

As far as I can tell, the locks serve to prevent multiple access to chain states. If that's the only scenario, one possible solution is something like what was attempted in #112 with a SharedCollection type.

This should be investigated to confirm if there are any other usage scenarios and if the proposed solution would work.

linera-io / linera-protocol Goto Github PK

linera-protocol's People

Contributors

Stargazers

Watchers

Forkers

linera-protocol's Issues

Motivation

Motivation

Ideas

Motivation

Motivation

Solution proposals

Description

Steps to reproduce

Motivation

Recommend Projects

Recommend Topics

Recommend Org