Git Product home page Git Product logo

engula's Introduction

Zulip Twitter

Engula is a distributed key-value store, used as a cache, database, and storage engine.

Architecture

topology

See design doc for more details.

Quick start

  1. Build
make build
  1. Deploy a cluster
bash scripts/bootstrap.sh setup
  1. Verify
cargo run -- shell

Run and enjoy it.

Contributing

Thanks for your help in improving the project! We have a contributing guide to help you get involved in the Engula project.

More information

For informal discussions, please go to the forum.

engula's People

Contributors

huachaohuang avatar jackdrogon avatar kezhenxu94 avatar leiysky avatar tisonkun avatar w41ter avatar zojw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

engula's Issues

Reduce CI time cost

engula using some Rust tools, like cargo-udeps, cargo-audit, maybe we should reduce CI time cost to improve contributor experience.

Mostly, maybe we should download binary rather than runing cargo install for every job.

elaborate the contributing guide

As the project grows, I answer similar questions again and again. Our current CONTRIBUTING.md somehow lacks information and doesn't specialize in our project. Thus, I propose to elaborate the contributing guide a bit to cover the following topics:

  • Get started
    • Prepare the development environment
    • Run examples
  • Communications (why communications are important, how we organize discussions)
    • Discussion forum
    • Zulip chat room
  • Contributions
    • Principles (reference)
    • Report issues
    • Review patches (it should be intuitive but many (potential) contributors ask me about this topic)
    • Contribute code (it should be intuitive also, with satisfying our CI tasks)
    • Licenses (many contributors confuse where a license header should be placed)

I'm going to prepare a PR this week to catch up on the exposure of v0.2 release. The points above are insights. If things go complex (I hope not yet), it can be factored out to the docs/ folder.

Reference:

cc @huachaohuang

cc @Xuanwo @zojw @DCjanus any information you think there should be?

Roadmap v0.2 - Engine

Provide a simple hash engine that supports get/set/delete key-value pairs.
The engine should be able to use different kinds of kernels.

Support `delete` operation

#143 only supports get/set, we should support delete too. This is not as easy as it might sound. We need to add tombstones for deleted records.

Roadmap v0.2 - Microunit

Non-goals:

  • Data persistence
  • Leader election and failover
  • Resource isolation and management

Tasks:

  • Design document
  • Commands
    • Start a node
    • Bootstrap a universe
    • Join a node to a universe

Can we use learned indexes to build a new file format?

There are a few papers about applying machine learning to construct indexes on sorted records.
These learned indexes can reduce memory usage significantly.
Maybe we can explore the possibilities to build a new file format with these learned indexes.

Specifically, the PGM index looks very interesting.
However, it seems only applicable to integers instead of arbitrary byte arrays.
Maybe we can apply some order-preserving minimal perfect hash functions here.

Just some immature thoughts here, welcome for discussion.

Reference:

Roadmap v0.2 - Background

Tasks:

  • Design document
  • Background traits
  • A local background implementation
  • A remote background implementation
    • Background client
    • Background service
  • Integrate the background service with microunit

Example does not have output

Although it does do some works and write under /tmp/engula_test/hello, however, a user friendly message helps improve the journey.

For example, tell the user he/she can find the output under the directory or just list them out. It seems when failed, the framework itself can give error message.

Adopt new chatroom Zulip

Discussed in #141

Originally posted by tisonkun November 30, 2021
TL;DR: Join Engula Zulip by https://engula.zulipchat.com.

In #33 we migrate the auxiliary online instant messaging tool from Gitter to Discord. However, during the experience these days and direct discussion with several contributors we found several shortcomings that pushes us to find a replacement:

  1. Discord cannot be accessed from somewhere. Connect to Discord is unstable or blocked in some regions, which likely restricted contributors from those regions to participating.
  2. Discord's APP isn't available on some platforms in some regions, too. In addition, Discord doesn't provide a web version on mobile. This causes significant inconvenience for discussing everywhere.
  3. One of the top reasons we choose Discord is that it provides a voice channel out-of-the-box. However, it turns out to be unstable.
  4. Another reason we choose Discord is that the Rust community uses it. But they migrated to Zulip years ago.

So, I do an investigation on Zulip, and here is the report about we can consider Zulip as a replacement:

  1. Zulip can be accessed where Discord cannot as shown above. For me, I can connect to the chatroom anywhere.
  2. Zulip provides meeting support based on Zoom.
  3. Zulip supports Markdown syntax to render messages, which is much concise than posting links on Discord.
  4. As mentioned above, many Rust teams migrated to Zulip. And there are endorsements from developers. In the meantime, Discord is designed for gamers to discuss online.
  5. As a personal preference, because of designing for gamers, Discord is in rich-text style, while Zulip is almost plain-text style. I like the latter.

This is not an urgent issue but I hope someone helps with practicing the experience on Zulip. You can join the organization on Zulip via this invite link ( https://engula.zulipchat.com/join/eicksrlhhokl4274ouijr7s6/ ). We still host all truth on GitHub and recommend Discord as the auxiliary online chatroom in README, until it turns out that Zulip is a better choice.

cc @huachaohuang @levy5307

Install PrometheusBuilder causes panic?

// This panic in some cases, haven't figure it out.

@huachaohuang I see this comment but don't know what is "in some cases". Could you please use a TODO comment as well as comment under this issue with an error stack?

It seems from the doc of install it already states that:

An error will be returned if there's an issue with creating the HTTP server or with installing the recorder as the global recorder.

Merge Bucket into Storage

We have three traits for the storage component now: Storage, Bucket, Object. This brings two problems:

  • The trait bound is a bit complicated, like Storage<Object, Bucket>
  • There are some corner cases dealing with individual buckets, for example, #86 (comment)

A simpler solution is to merge Bucket into Storage, so Storage will provide the following interfaces:

  • list buckets
  • create bucket
  • delete bucket
  • list objects in a bucket
  • upload an object to a bucket
  • delete an object from a bucket

We can still keep Object and ObjectUploader as usually. Note that only Object is performance-critical here, as users may read from an object frequently. Other storage operations don't need to care about performance very much.

Investigate how to test with s3

ref #99 (comment)

need some method to automatic test with AWS service like s3 in CI without leaking access&secret key, IMHO, maybe we can do it in two directions:

  • unit test: mock AWS SDK, there are exists an issue indicates SDK support TestConnection to mock/record request-response and we take some tries, but it seems we need more encapsulation works to make it easy for use.
  • integration-test: deploy a temp S3-like service(for example https://min.io/) in CI pipeline and make test case using temp S3-like service.

Welcome to give more advice or contribution code ๐Ÿ˜„

journal - make timestamp generic

What do you mean by "make it generic"? Does it mean:

pub struct Event<TS> {
    pub ts: TS,
    pub data: Vec<u8>,
}

or Timestamp<T>? Is there similar stuff in other projects?

cc @huachaohuang

Originally posted by @tisonkun in #82 (comment)

The Event<TS> one, but should be more complicated than that. The rationale is to let users customize the type of the timestamp. Depending on how users implement their databases, the usage of timestamp can be very different. For example, it can be a simple increasing counter, a logical lock, or a hybrid logical lock, which may need different representations.

Originally posted by @huachaohuang in #82 (comment)

Apply license header to engula work

First of all, I must clarify that this project seems still in an early stage so this issue is a suggestion to consider along with the project grows.

According to APL 2.0, it is a common practice to apply explicitly the license by:

To apply the Apache License to specific files in your work, attach the following boilerplate declaration, replacing the fields enclosed by brackets "[]" with your own identifying information. (Don't include the brackets!) Enclose the text in the appropriate comment syntax for the file format. We also recommend that you include a file or class name and description of purpose on the same "printed page" as the copyright notice for easier identification within third-party archives.

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

If we decide to stick to APL 2.0, we'll meet this proposal sooner or later. Contributors probably complain that a template is too heavy to carry when we're still in an early stage - we don't care about how others make use of the demo. But when you think your work has its shape, it is a signal that you should consider this proposal.

Also, thinking about the name of "copyright owner" is essential for the project.

Last, without audit tools the license header can easily miss or with typo, etc. skywalking-eyes will be a great language-agnostic solution integrated with GitHub Actions.

Find a practice to return asynchronous paged results

We have some APIs that need to return a list of values. For example, get a list of object names from a bucket. The simplest way is to return a Vec<T>. But if the list goes too long, it is not a good idea to return it all at once. A common solution for this kind of problem is paging. Rust provides a concept called Stream, which is like an asynchronous iterator that allows the caller to get one value at a time.

I think the most mature way to do that is to use Stream from the futures crate. But it is not a good idea for our public APIs to depend on a third-party trait, which is not stable either. Fortunately, Rust is working on a built-in Stream trait (nightly), so we may consider using that in our public APIs when it is stablized.

Settle down the project layout

Rust crates:

  • engula: the integral library and binary
    • engula-engine: contains different kinds of storage engines
    • engula-kernel: contains the kernel abstraction and some built-in implementations
    • engula-journal: contains the journal abstraction and some built-in implementations
    • engula-storage: contains the storage abstraction and some build-in implementations

TODO:

  • Improve descriptions in Cargo.toml
    • engula
    • engula-engine
    • engula-kernel
    • engula-journal
    • engula-storage

Manifest abstraction to store kernel metadata

Kernel needs a metadata store for its metadata persistence. We have two choices:

  • Add a general metadata abstraction in the same level with Journal and Storage
  • Add a sub-module inside Kernel that only handles kernel specific metadata

For v0.2, we can choose the second solution for simplicity. We can still move to the first solution in the future if the second one is proofed to be inadequate.

Roadmap v0.2 - Storage

Tasks:

  • Design document
  • Storage abstraction
  • A mem storage implementation
  • A file storage implementation
  • A grpc storage implementation

Most tasks have been done, but we still need to refactor some implementations before releasing v0.2.

Consider separate executable and library

In #78 we simplify the project layout into:

โ”œโ”€โ”€ Cargo.lock
โ”œโ”€โ”€ Cargo.toml
โ”œโ”€โ”€ src
โ”‚ย ย  โ”œโ”€โ”€ api
โ”‚ย ย  โ”œโ”€โ”€ background
โ”‚ย ย  โ”œโ”€โ”€ hello_unit.rs
โ”‚ย ย  โ”œโ”€โ”€ main.rs
โ”‚ย ย  โ”œโ”€โ”€ manifest
โ”‚ย ย  โ”œโ”€โ”€ microunit
โ”‚ย ย  โ”œโ”€โ”€ node.rs
โ”‚ย ย  โ””โ”€โ”€ storage

However, the root cargo config file define both the executable and library, while a consumer of the library doesn't want the executable part. We may consider releasing executable and library separately. @zojw suggested learn from how tokio does release.

cc @huachaohuang @PsiACE

Find tool and add Github Action to check toml files

We have two toml files style now. For example, this uses the style:

tokio = {version = "1.13", features = ["full"]}

But this use the style:

tokio = { version = "1.13", features = ["full"] }

Note the whitespace between {}. It would be nice to keep toml files consistent for all contributors.

Roadmap for demo 1

The project is just started and is still in the demo stage now.
The primary goal in this stage is to explore the possibility of our designs.

We plan to release the first demo at the end of Sep 2021.
We are going to achieve the following objectives:

  • A working demo with simple read/write APIs on AWS
  • A demo report about what we did and the lessons learned

Tasks:

  • Cache
  • HybridStorage
  • SstableStorage
  • ParquetStorage
  • RemoteJournal
  • RemoteManifest
  • RemoteCompaction
  • Support S3 file system and SELECT
  • A purpose-built benchmark tool

However, we are short of hands.
If you think we are behind schedule, please push us with a ๐Ÿ‘

Update: this demo has ended, please check the report for more details.

Write down the first version CONTRIBUTING.md

In order not to disturb your prototyping and rapidly development, I'm glad to write this file and propose a PR, with your help on collecting necessary information.

The draft is looks like:


How to Contribute

I'm really glad you're reading this, because we need volunteer developers to help this project come to fruition.

If you haven't already, come find us on gitter. We want you working on things you're excited about.

Welcome to review our design or participant discussions about the roadmap!

Get Started

We develop Engula with rust stable toolchain.

You're able to get started with Engula with three steps:

  1. Setup the environment with rustup.
  2. Build Engula via cargo build.
  3. Run the example via cargo run --example hello.

Report an Issue

If you think you have found an issue in Engula, you can report it to the issue tracker.

Before filing an issue report is to see whether the problem has already been reported. You can use the search bar to search existing issues. This doesn't always work, and sometimes it's hard to know what to search for, so consider this extra credit. We won't mind if you accidentally file a duplicate report. Don't blame yourself if your issue is closed as duplicated.

If the problem you're reporting is not already in the issue tracker, you can open a GitHub issue with your GitHub account.

Submitting a Pull Request

Please send a GitHub Pull Request to Engula with a clear list of what you've done (read more about pull requests). When you send a pull request, we're looking forward to an expressive description, clear commit messages, and more test coverage if it is code contribution.

Before submitting the pull request, please make sure all tests pass locally:

cargo build --release
cargo test
cargo clippy -- -D warnings
cargo fmt --all -- --check

Thank you for your participation!


The questions are:

  1. Do we protect main and only modify it by PR?
  2. What merge strategy do we use, specially, merge with commit, rebase and merge, or squash and merge? I highly recommend the latter two where merge with commit make history hard to read - however, I'm not participant your project deeply, so it's your choice.
  3. Shall we adopt a code of conduct? If so, I'd suggest Contributor Covenant Code of Conduct and the project should provide a contact method.
  4. Any other concern on the draft above?

Port mini-redis to store data in the hash engine

From Zulip:

Hmm, maybe we can let mini-redis run on our hash engine. I think the job should not be hard since mini-redis stores everything in a HashMap. It will be an interesting way to demonstrate the usage of our hash engine in v0.2.
We can consider working on that once we get #143 landed

This is not required for v0.2. But if anyone is interested in this job, we can add it to the list ๐Ÿค“

A file journal implementation

We have a mem and grpc implementation now. But most applications will need persistent journal storage. A sophisticated implementation is not easy, we can start with a naive implementation for now. Check RocksDB LogWriter for an example.

Check cargo build is stable

With current main 027b59a cargo build generate a diff on Cargo.lock file.

To prevent leaving an unclean repo where users find there is an annoying diff soon after checking out the repo, I propose to check cargo build generates consistent lock file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.