Git Product home page Git Product logo

linera-protocol's People

Contributors

afck avatar andresilva91 avatar ashu26jha avatar christos-h avatar colerar avatar duguorong009 avatar fakefraud avatar firedpeanut avatar harnen avatar jvff avatar laurentmazare avatar ma2bd avatar martinkong1990 avatar mathieudutsik avatar maxtori avatar nirajsah avatar papadritta avatar twey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linera-protocol's Issues

Make dynamoDb handle a key_prefix of length 0

Right no DynamoDb does not allow a key_prefix of length 0, which is I think a pity since other systems (memory / rocksdb) allow for a key_prefix of length 0.
Right now the chosen solution is to work with a prefix vec![0] which is I think a hack that should be eliminated.

Avoid locking when handling requests

The current implementation of the server uses a Mutex to guard the MessageHandler instance used as a state. This prevents concurrent handling of requests, and ends up degrading performance.

Computation of indices to make faster + allow order dependent `for_each_index`.

The code for computing indices iterates over the entries in the database then look at the updates. Then the indices are sorted for operation. This is suboptimal since the keys come from the database as sorted and the entries from the updates are also sorted so this could be done in O(N) time.

The second point is about for_each_index. The whole point of this function is that we do not need to build the index set but instead we can iterate over all the entries and the function does its business on it. Therefore, we are forced to require functions f which are order independent. Building the O(N) algorithm above would allow to resolve this problem.

Implement `batch` for memory and `dynamo_db`

This would avoid many async operations which would be a big gain. It could (possibly) affect negatively the performance of the memory part but allow room for the following two improvements:

  • Implement a low level batching for dynamo_db.
  • Refactoring of the existing code in order to have more code sharing between memory, rockdb and dynamo_db.

Improve the APIs of linera-views

We discussed a few low hanging fruits in linera-views:

  • generalize the logics so that (write_)commit and (write_)delete take &mut self (in other words views are still usable after committing changes to storage)
  • not sure if delete should be renamed commit_deletion or split into two parts: staging a delete and committing it.
  • similarly, we don't support calling remove_entry then load_entry on the same index in the same CollectionView instance. To solve this, we need a function reset_to_default that stages the fact that we are resetting the object.
  • As result, maybe AppendOnlyLogView should be renamed LogView or VectorView.

Add S3 as a storage option

Motivation

Currently Linera does not support any storage that is shared between workers. Allowing a storage layer to be shared between workers will simplify shard reassignment. Otherwise, workers will have to coordinate to transfer shard information between them.

The first shared storage layer we aim to support is Amazon S3, because it's relatively simple and provides the reliability and atomicity that we require.

Replace usage of `failure` with `anyhow` or `thiserror`

Currently the failure crate is used for error handling, but anyhow and thiserror are more modern crates that are more specific to be used as error frameworks or Error trait implementations (respectively).

zef-service should use anyhow, while zef-core, zef-base and zef-storage should use thiserror.

Improve creation of `S3Storage` instances in tests

Motivation

Current S3Storage unit tests constantly have to ignore the BucketStatus returned from the S3Storage constructors. That increases the boiler-plate code in tests that lead to no new useful information.

Ideas

  1. Refactor the constructors, so that some of them return a BucketStatus and some don't. The naming pattern to use still needs to be figured out.
  2. Update LocalStackTestContext to have a create_s3_storage method that ignores the BucketStatus.

Use generative tests for testing `S3Storage`

Motivation

Using generative tests should help with improving readability of the tests (because unimportant dummy values don't have to be manually created) and with the coverage of the tests (because it uses more inputs that could lead to new situations).

Linera doesn't use any generative tests yet, but when I implemented the initial S3Storage tests I wrote them to use it. However, that required a lot of work, and the PR (#65) became larger than expected, making review more difficult. Therefore that PR was stripped from the generative tests, and a separate PR should be open for them.

This also makes it easier to discuss the necessary changes in the code that are needed in order to support generative tests.

refactor: ArithmeticErrors

Currently we have linera_base::error:Error which only contains overflow/underflow errors. This should be pulled under ArithmeticError which has variants for different types - i.e. BlockHeight, Round, etc.

Make `flush` async free

The only obstacle to it is for the MapView and CollectionView require to be async because of the access to the indices.
The proposal is "Now I'm thinking this could be nicely optimized away with a new command WriteOperation::DeleteRange { prefix: Vec }" which would make it possible.

Fix handling of retryable errors by clients

The client-side logic that I added early on to deal with server-side errors is broken because there are legitimate scenarios where honest validators don't agree on a single error, yet the error should be processed and the query retried later. (EDIT: because in case of missing messages, the code retried only once and without a delay!)

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Failed to communicate with a quorum of validators:
[
    MissingCrossChainUpdate {
        application_id: ApplicationId(
            0,
        ),
        origin: Origin {
            chain_id: f783f3b45d003c39,
            medium: Direct,
        },
        height: BlockHeight(
            1,
        ),
    },
    MissingCrossChainUpdate {
        application_id: ApplicationId(
            0,
        ),
        origin: Origin {
            chain_id: 3a5228a1b887e471,
            medium: Direct,
        },
        height: BlockHeight(
            1,
        ),
    },
]', linera-service/src/client.rs:704:74

Unify `commit` and `flush` codes.

This requires cleaning up of the set(&mut self, batch: &mut Self::Batch, value: T) of RegisterView so that it takes a value that is a &T.
That way, we would store only references, so no potentially big clone operations would occur when doing a flush of a RegisterView.
Right now, the set of dynamo_db and rocksdb take a value and immediately take the reference. But that is not the case for memory. It would be possible after the restructurations of the batching code of memory. Still, difficult lifetime issues would have to be addressed.

Improve the RPC protocol

The current RPC protocol used in Zefchain is entirely homemade, which while was good for learning and might be good for an audit, has a few limitations:

  • might not be sustainable to maintain it while adding complexity
  • unbounded message sizes
  • UDP doesn’t work with large blocks (or lossy channels)
  • it leads to sub-optimal locking, because a single write request can lock the whole worker's storage and block other requests
  • no support for user notifications
  • no encryption
  • not currently possible to reuse connections for multiple requests

The RPC protocol should be improved to address these limitations, possibly using existing crates like tokio-util and tower.

Split `linera_base::error::Error` into several types

We currently use error linera_base::error::Error for different things:

  • an error type for most fallible functions in the codebase Result<T, Error> disregarding the crate,
  • a data type that goes on the wire in case of server errors.

Recently (thanks @jvff), we started defining smaller error types locally in each crate. In the same spirit:

  • We should split linera_base::error::Error so that each crate has its own error defined locally.
  • There should be execution/chain errors.
  • We should define an error type linera_core::node::NodeError meant to be serialized and sent over the network.
  • There should be linera_core::client::ClientError (see also #66 ).

Restructure tests of the linera-views

There are several problems:

  • Some test uses only the memory interface while all tests should use all 3 interfaces.
  • Some bugs were not detected by the tests which is bad.
  • Some tests are badly located in the tree structure.

[storage] support schema-less data hashing and deletion

Currently, hashing and deleting the content of a view depends on the Rust-defined type schema. This is notably true for CollectionViews for which we cannot even define a low-level delete operation because we don't know the internal structure of the "value" views.

The goal of this task is to define a low-level storage layout so that the set of keys present in a database can be accurately iterated without knowledge of the Rust types.

Accelerate the hash computation

The hash of structures like CollectionView is computed recursively. Could this be accelerated?
One possibility would be to put some type like Option<u256>. The problem is that if a hash is invalidated by a commit, we would need to invalidate recursively the others that depend on it.
It has to be seen if the speedup is really needed. So, first some benchmarks would be needed.

Improve client <- server synchronization of received certificates

Following #53 validators cannot accept cross-chain requests from future epochs any more: in theory, those messages could be safely accepted in the inbox but because of channels, this means that the blocks in received_log would not be sorted by increasing epochs any more, thus contradicting assumptions made by find_received_certificates in client.rs.

Prevent misuse of view APIs

Right now there are many different ways to misuse the View objects:

  • cloning and committing several times, (FIXED)
  • batch/committing interior views.

For the last point, I believe we should just rename the current trait and tweak the new derive macros so that not every user-defined View has save.

Concretely, we could just do the following:

  • rename ContainerView -> RootView
  • remove HashableContainerView
  • provide #[derive(View)], #[derive(RootView)], #[derive(HashableView)]

Allow configuring the S3 bucket name

The bucket name used in S3Storage is currently hard-coded, but it must be configurable so that different validators use different buckets.

Without this, in the integration test, the validators all use the same bucket and that may lead to an invalid state.

Use PartitionKey and SortKey for DynamoDb.

Currently, the DynamoDb code is first downloading the keys with prefixes and then deleting them one by one.
It appears that there is no way to do a deletion by range in DynamoDb. What we have are SortKeys and other related notions. In other words, we have to hardcode the prefix directly in the database in order to delete them.

This actually reflects the way we are using delete_key_prefix. Our one and only use case is batch.delete_key_prefix(context.base_key());. So, we should build data on dynamoDb in an orderly way in order to delete it in an orderly way.

Clarify the routing of messages

When execution generate effects, the targeted remote inbox(es) are currently relatively implicit. I quickly fixed a safety issue at the end of #69 but as a followup we should probably change the data structure so that message routing is very explicit.

One idea is to change Vec<Effect> (inside Value) into Vec<Message> where struct Message { medium: Medium, effect: Effect } and enum Medium { Direct, Channel(String) }.

[zef] tolerate crashed workers

Now that workers have persistent storage, we'd like to make sure that killing a worker process never results in lost computations (e.g. missing money on the receiver's side).

In the first iteration of this task, we'll make the following simplifications:

  1. Interrupted computations do not need to resume when the worker restarts (instead this can be done lazily when accounts are read again).
  2. We tolerate unbounded hot storage.

For 1. another option would be to aggressively resume computations when a worker restarts. However, this requires an accurate list of the all accounts (potentially stopped) in this shard.

For 2. eventually we should make sure that workers' memory is bounded. (The corresponding persistent data in storage is called "hot storage". The rest, "cold storage", is only needed for auditing and data validation after synchronization.)

[zef] provide better storage options

The current storage to disk was fun to hack but using the filesystem is quite limiting (e.g. total number of keys).

  • For testing, RocksDB would be the easiest to add. Then we could remove the filesystem implementation altogether.
  • Later, we may also want to support a proper database engine such as Postgres.
  • For production, we need to support at least S3 (probably using localstack to test locally and in CI)

Finish migration to linera-storage2

After #98 there will likely be a number of things left:

  • recover the deleted code relative to S3 and port it to support DynamoDB in linera-service (@jvff)
  • #119
  • delete chain.rs in linera-base and linera-storage
  • #120
  • #121
  • fix other temporarily deactivated tests (may require #101 however) (@jvff)

sub-modules cannot be tested individually

Description

On main, cargo test runs just fine running the entire test suite. However if you run tests for just one sub-module, the tests don't compile.

Steps to reproduce

git checkout main
cargo test -p linera-storage

S3 storage is not initialized in `make_storage`

When StorageConfig::make_storage is called, it needs to populate the storage with initial data if the storage doesn't yet exist. However, if it creates an S3Storage, there's currently no way for it to determine if the bucket was new and empty or if it was already initialized.

This should be fixed so that the S3 storage is properly initialized if needed.

Optimize storage updates for chain states

Currently the storage layer serializes the whole chain state every time it is updated. To make things worse, this includes non-constant-size data such as logs and queues.

We should split the chain state across multiple keys and use multi-key operations to maintain atomicity (important for crash-tolerance).

Remove scripts used for Docker images

The scripts used in the Docker images are a little brittle. They're an extra thing to keep track of, and failing to update them can lead to CI failures that are had to diagnose, like this one.

The scripts should be removed, but the binaries should be updated to handle the things that the scripts handle, like obtaining the validator configuration.

Move server states into a shared map

Currently, guards are used to protect concurrent access to chains. This is done using a ChainGuards type in linera-service. However, this is brittle, because it's still possible to forget to obtain a guard before changing a chain's state.

An API should be written that ensures there's no way to access or change a chain without obtaining a lock.

The refactor could also allow the Server type to not implement Clone.

Investigate removal of lock guards inside `Context` implementations

Motivation

Current linera_views::views::Context implementations require an OwnedMutexGuard when they are constructed. This sometimes leads to situations where dummy locks are created.

As far as I can tell, the locks serve to prevent multiple access to chain states. If that's the only scenario, one possible solution is something like what was attempted in #112 with a SharedCollection type.

This should be investigated to confirm if there are any other usage scenarios and if the proposed solution would work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.