toshi-search / toshi Goto Github PK
View Code? Open in Web Editor NEWA full-text search engine in rust
License: MIT License
A full-text search engine in rust
License: MIT License
This is just a tracking issue for additional config items that need to be added:
Steps taken:
cargo build --release
target/release/toshi
INFO toshi > Base data path data/ does not exist, creating it...
INFO toshi > Clustering disabled...
INFO toshi::index > Indexes: []
______ __ _ ____ __
/_ __/__ ___ / / (_) / __/__ ___ _________/ /
/ / / _ \(_-</ _ \/ / _\ \/ -_) _ `/ __/ __/ _ \
/_/ \___/___/_//_/_/ /___/\__/\_,_/_/ \__/_//_/
Such Relevance, Much Index, Many Search, Wow
INFO gotham::start > Gotham listening on http://[::1]:8080
curl -X GET http://localhost:8080/ -v
Note: Unnecessary use of -X or --request, GET is already inferred.
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.62.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< x-request-id: be615a31-a504-430a-a06a-d2017a1a3b81
< content-length: 0
< date: Fri, 23 Nov 2018 19:56:47 GMT
<
* Connection #0 to host localhost left intact
I am running toshi with the default config file
Dependabot can't resolve your Rust dependency files.
As a result, Dependabot couldn't update your dependencies.
The error Dependabot encountered was:
Updating git repository `https://github.com/tower-rs/tower`
Updating crates.io index
Updating git repository `https://github.com/carllerche/better-future`
Updating git repository `https://github.com/toshi-search/systemstat`
Updating git repository `https://github.com/carllerche/tokio-connect`
Updating git repository `https://github.com/LucioFranco/tower-consul`
Updating git repository `https://github.com/tower-rs/tower-grpc`
Updating git repository `https://github.com/tower-rs/tower-h2`
Updating git repository `https://github.com/tower-rs/tower-http`
error: no matching package named `tower-direct-service` found
location searched: https://github.com/tower-rs/tower
required by package `toshi v0.1.1 (/home/dependabot/dependabot-updater/dependabot_tmp_dir
If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.
You can mention @dependabot in the comments below to contact the Dependabot team.
Right now, the node_id
in ConsulInterface is public so that it can be assigned in main.rs
. This whole section could be refactored to load or generate both consul and node id, then create the ConsulInterface once we have all the needed info.
When a node comes up, it should try to register with the specified Consul cluster
The consul client should be split into a consul::Builder
and consul::Consul
. Currently, the building of a consul client and the client itself are in the same impl block on the same struct. Ideally, there should be some builder phase that collects the items needed to create the client. This also would allow the spawning phase that is required to build a tower_buffer
since it needs to spawn the actual service to drive it in the background. This is why we need to wrap Consul::default()
in a future::lazy
because it needs to get the DefaultExecutor::current()
that is stored in a thead local variable.
Generally speaking, capnp and gRPC based libraries accomplish the same thing in the sense that they provide a language-agnostic way of expressing the API and Service. Unlike those, tarpc uses Rust structs and macros to define the components of the RPC. Because of this, I do not see its advantage over something that is language agnostic.
Capnproto seems like a decent choice because it is all pointer based and doesn't actually do any parsing but because of this, the actual rust implementation has a lot of unsafe code that in my opinion removes the reason for why we use rust.
This leaves the three implementations of gRPC. I find gRPC to be a pretty good fit due to its http2 backing being very fast and low overhead. It has support for many languages and is a specification supported by Google. Currently, pingcap/tikv
and linkerd2-proxy
use a gRPC implementation.
grpc-rust
was the first gRPC implementation using the rust protobuf wrapper around the protoc
rust plugin. This means that any project using this must depend on the protoc
binary being in the path. While this is not a horrible route it's not perfect. The biggest problem with grpc-rust is mostly that in the readme there is a TODO item that says Fix performance
which is not very promising.
This leaves us with grpc-rs
and tower-grpc
. Lets start with grpc-rs
, it is the library crated by pingcap for use within tikv. The basis for this library comes from the c bindings generated by compiling the grpc-go
implementation. That being said this library is by far the most feature complete since it gets all of its features from a previous implementation. Though, since the library is pretty much ffi calls its api is not very ergonomic and is quite rough around the edges.
This brings us to tower-grpc
, this is the library created by the people behind tokio
. This library builds off of the tower-service
trait. This library is by far the newest out of all five. That being said it actually has more repositories on GitHub using it than capnp-rpc
. tower-grpc
uses prost
under the hood to build the rust code. This means, that there are zero external dependencies that are not in rust. This to me is quite powerful since we are already a pure rust project. The ergonomics of the API are also very nice since there are a lot of similarities with tokio, meaning people who are used to working with tokio
should have an easy time with tower-grpc
. The other powerful benefit to using tower
and its accompanying crates is that there is an ecosystem around different middleware that can be used. Everything, from load balancing to timeouts and more. The drawback of this library is that it is currently not released on crates.io and still has a somewhat unstable API. That being said, I got in contact with the tower
people and they said that the actual public interface for tower-grpc
is not going to change and that a 0.1
of it will be released soon on crates.io.
All this said, I personally find thattower-grpc
is the right choice, mostly as it seems that this method is the way the community is heading towards. It is by far the most flexible library and already is pretty stable (from my usage).
I would like to hear more thoughts on this, cc @fhaynes @hntd187
This is related to #14
ES uses a protocol called zen
to discover cluster members. Further, ES has multiple node types, one of which is master
. Clusters and masters have the following characteristics:a
As a quicker path to clustering, we can use an external tool such as Consul to handle node registration and leader election. This is similar to how many Apache projects use ZooKeeper. The flow would look something like this:
A different approach is to not have leaders or central coordination. We could use consistent hashing to decide on which node data is placed. This would require virtual rings to deal with replicas, as otherwise the algorithm would place the replicas on the same node as the primary.
This is the approach Cassandra and Riak use. http://docs.basho.com/riak/kv/2.2.3/learn/concepts/vnodes/ is a good introduction to the concept.
Toshi should accept a Consul address as a CLI flag. Related to #15
Hey,
Assuming you want to build a better/faster elasticsearch alternative. What do you think about building this on top of tikv
?
This way you get replication, sharding, commitlog, transactions,raft,grpc,backups,multi-datacenter,native-local-functions etc for ~free.
At a minimum, you can replace the sstable
of rocksdb
to store a tantivy segment
.
Doing crud you check on commitlog then on the segment
, while doing search you just check on sstables. You can replace the memtable
with another tantity segment
to enable real-time querying without refresh. (refreshing would translate+persist the memtable
memory-segment to sstable
disk-segment.
Using grpc maybe even be better than http/json.
Makes sense ? Compared to do-it-yourself ?
ps: best/extreme scenario would be to build it on top of seastar-framework but it's probably too much work compared to above.
Issue received in tantivy-cli.
We should no longer be using use some_crate::module::*
and use explicit imports.
This issue tracks implementation of clustering into Toshi.
I've laid out some of the groundwork for this with tower-grpc, so I should be able to get rolling with this today.
Things I'd like to do by end of the year
IndexHandle
to understand it might have to do an rpc call to fullfill a queryIndexHandle
into LocalIndexHandle
IndexHandle
and out of IndexCatalog
IndexCatalog
to include both LocalIndexHandle
and RemoteIndexHandle
@LucioFranco is there a more rustique
way to write this?
let node_id: String;
if let Ok(nid) = cluster::read_node_id(&settings.path) {
info!("Node ID is: {}", nid);
node_id = nid;
} else {
// If no file exists containing the node ID, generate a new one and write it
let random_id = uuid::Uuid::new_v4().to_hyphenated().to_string();
info!("No Node ID found. Creating new one: {}", random_id);
node_id = random_id.clone();
if let Err(err) = cluster::write_node_id(random_id, &settings.path) {
error!("{:?}", err);
std::process::exit(1);
}
}
Accidentally omitting document content returns 500 Internal Server Error
with a body of {"message":"Internal error","uri":"/new_index"}
Emitting any kind of helpful message would be helpful. Also, in my experience, when the client receives a 500 response, there is usually something informative on the server-side. But in this case, the server emits the same message that the client receives, which isn't helpful.
This bug is actually just the worst offender of a whole class of bugs where if something doesn't go Toshi's way, it just gives back a raspberry, but I'd say getting a 500 for an empty document is pretty far up the list for me
Assuming you create an index based on the cargo test
schema, then send in an indexing request of the form
$ echo '{}' | curl ... -X PUT -d @- 127.0.0.1:9200/new_index
This issue tracks the implementation of the various query types. The ones Tantivvy natively supports are highest priority for implementation, followed by the ones it does not directly support.
I totally get that refactoring to be agnostic to discovery mechanisms would be a significant time investment. On that front, I'd be happy to contribute the kubernetes part if you decide to go that route.
With that said, it's fairly straightforward to use the kubernetes API. An HTTP request is made to https://kubernetes.default.svc.cluster.local/api/v1/namespaces/<namespace>/endpoints?labelSelector=<name-defined-in-k8s-config>
. The response is something like this, assuming serde for serialization
#[derive(Serialize, Deserialize, Debug)]
struct Addresses {
ip: String,
#[serde(rename = "nodeName")]
node_name: String,
#[serde(rename = "targetRef")]
target_ref: TargetRef,
}
#[derive(Serialize, Deserialize, Debug)]
struct Items {
metadata: Metadata1,
subsets: Vec<Subsets>,
}
#[derive(Serialize, Deserialize, Debug)]
struct Labels {
app: String,
}
#[derive(Serialize, Deserialize, Debug)]
struct Metadata {
#[serde(rename = "selfLink")]
self_link: String,
#[serde(rename = "resourceVersion")]
resource_version: String,
}
#[derive(Serialize, Deserialize, Debug)]
struct Metadata1 {
name: String,
namespace: String,
#[serde(rename = "selfLink")]
self_link: String,
uid: String,
#[serde(rename = "resourceVersion")]
resource_version: String,
#[serde(rename = "creationTimestamp")]
creation_timestamp: String,
labels: Labels,
}
#[derive(Serialize, Deserialize, Debug)]
struct Ports {
name: String,
port: i64,
protocol: String,
}
#[derive(Serialize, Deserialize, Debug)]
struct K8sEndpoint {
kind: String,
#[serde(rename = "apiVersion")]
api_version: String,
metadata: Metadata,
items: Vec<Items>,
}
#[derive(Serialize, Deserialize, Debug)]
struct Subsets {
addresses: Vec<Addresses>,
ports: Vec<Ports>,
}
#[derive(Serialize, Deserialize, Debug)]
struct TargetRef {
kind: String,
namespace: String,
name: String,
uid: String,
#[serde(rename = "resourceVersion")]
resource_version: String,
}
Retrieving the ip addresses is as simple as
let mut list_of_nodes = Vec::new();
for item in endpoints.items {
for subset in item.subsets {
for address in subset.addresses {
list_of_nodes.push(address.ip);
}
}
}
Per #19 if leader election wanted to be done, kubernetes has a unique number tied to each API object called resourceVersion
. Here, each Address
has a TargetRef
field which will have resource_version
field. The leader can be chosen via min/max of the resource version associated with it. Kubernetes can also expose the pod name to the container via environment variable so any toshi node can know its kubernetes identifier.
Related to #39
Nodes should update their metadata in consul on a regular basis
Related to #39.
This is the name all nodes will register themselves under Consul in. Related to #14
Dependabot can't resolve your Rust dependency files.
As a result, Dependabot couldn't update your dependencies.
The error Dependabot encountered was:
Updating git repository `https://github.com/tower-rs/tower`
Updating crates.io index
Updating git repository `https://github.com/carllerche/better-future`
Updating git repository `https://github.com/toshi-search/systemstat`
Updating git repository `https://github.com/carllerche/tokio-connect`
Updating git repository `https://github.com/LucioFranco/tower-consul`
Updating git repository `https://github.com/tower-rs/tower-grpc`
Updating git repository `https://github.com/tower-rs/tower-h2`
Updating git repository `https://github.com/tower-rs/tower-http`
error: no matching package named `tower-direct-service` found
location searched: https://github.com/tower-rs/tower
required by package `toshi v0.1.1 (/home/dependabot/dependabot-updater/dependabot_tmp_dir
If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.
You can mention @dependabot in the comments below to contact the Dependabot team.
Related to #39
Toshi looks awesome. We'd love to evaluate using Toshi in our product, so I'm wondering if there is a roadmap detailing major milestones toward changing the project tagline from "Note: This is far from production ready" to something like "Here are the benchmarks comparing Toshi, Elasticsearch, and Solr."
Are there any existing benchmarks? Or more importantly, a guide to how we might be able to contribute?
Thanks for guidance, and sorry if I missed some obvious documentation about roadmap somewhere?
Also, is there a collaboration channel like IRC where Toshi developers hangout?
pub struct RangeResult {
key: String,
#[serde(skip_serializing_if = "Option::is_none")]
to: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
from: Option<String>,
num_docs: u64,
}
in RangeResult is throwing an un-used warning during cargo test.
Index creation (schema from README) fails, the cli output is:
INFO toshi::index > Indexes: []
/_ / ___ / / () / / ___ / /
/ / / _ (-</ _ / / \ / -) _ `/ __/ __/ _
// _////// //_/_,// _///_/
Such Relevance, Much Index, Many Search, Wow
INFO toshi > "GET / HTTP/1.1" 200 15.365ยตs
INFO toshi::router > Error { kind: "ErrorKind::Internal" }
INFO toshi > "PUT /test_index HTTP/1.1" 500 56.23ยตs
No files are created in the data directory.
A cargo build
seems to leave these directories lying around:
logs/
new_index/
Should we clean them up? Or add them to .gitignore?
The current queries "work" but to get some better parity as @LucioFranco rightly pointed out it's probably good to break them out and start thinking about how to make something more extendable.
Some annoying examples of ES's Query language are the bool query (and this comes up in lots of places) where it can accept one or more T
types of queries, and rather than it making an array with 1 element it flattens the array out.
I think we can create a lot of partity it's just this is one kind of query, they'll be a lot of boilerplate to cover it all. Unless there are some better ideas?
{
"query":{
"bool":{
"must":[
{
"term":{
"user":"kimchy"
}
},
{
"range":{
"age":{
"gte":-10,
"lte":99999999999999
}
}
}
],
"filter":[
{
"term":{
"user":"kimchy"
}
},
{
"range":{
"age":{
"gte":10.5,
"lte":-20.3333333
}
}
}
],
"must_not":[
{
"term":{
"user":"kimchy"
}
},
{
"range":{
"age":{
"gte":10,
"lte":20
}
}
}
],
"should":[
{
"term":{
"user":"kimchy"
}
},
{
"range":{
"age":{
"gte":10,
"lte":20
}
}
}
],
"minimum_should_match":1,
"boost":1.0
}
}
}
Request {
aggs: None,
query: Some(
Boolean {
bool: Bool {
must: [
Exact(
ExactTerm {
term: {
"user": "kimchy"
}
}
),
Range {
range: {
"age": I64Range {
gte: Some(
-10
),
lte: Some(
99999999999999
),
lt: None,
gt: None
}
}
}
],
filter: [
Exact(
ExactTerm {
term: {
"user": "kimchy"
}
}
),
Range {
range: {
"age": F32Range {
gte: Some(
10.5
),
lte: Some(
-20.333334
),
lt: None,
gt: None
}
}
}
],
must_not: [
Exact(
ExactTerm {
term: {
"user": "kimchy"
}
}
),
Range {
range: {
"age": U64Range {
gte: Some(
10
),
lte: Some(
20
),
lt: None,
gt: None
}
}
}
],
should: [
Exact(
ExactTerm {
term: {
"user": "kimchy"
}
}
),
Range {
range: {
"age": U64Range {
gte: Some(
10
),
lte: Some(
20
),
lt: None,
gt: None
}
}
}
],
minimum_should_match: 1,
boost: 1.0
}
}
)
}
#[test]
fn test_enum() {
use std::collections::HashMap;
macro_rules! type_range {
($($n:ident $t:ty),*) => {
#[derive(Deserialize, Debug, PartialEq)]
#[serde(untagged)]
pub enum Ranges {
$($n {
gte: Option<$t>,
lte: Option<$t>,
lt: Option<$t>,
gt: Option<$t>
},)*
}
};
}
type_range!(U64Range u64, I64Range i64, U8Range u8, F32Range f32);
#[derive(Deserialize, Debug, PartialEq)]
struct Bool {
#[serde(default = "Vec::new")]
must: Vec<TermQueries>,
#[serde(default = "Vec::new")]
filter: Vec<TermQueries>,
#[serde(default = "Vec::new")]
must_not: Vec<TermQueries>,
#[serde(default = "Vec::new")]
should: Vec<TermQueries>,
minimum_should_match: u64,
boost: f64,
}
#[derive(Deserialize, Debug, PartialEq)]
struct ExactTerm {
term: HashMap<String, String>,
}
#[derive(Deserialize, Debug, PartialEq)]
struct FuzzyTerm {
value: String,
#[serde(default)]
distance: u8,
#[serde(default)]
transposition: bool,
}
#[derive(Deserialize, Debug, PartialEq)]
#[serde(untagged)]
enum TermQueries {
Fuzzy { fuzzy: HashMap<String, FuzzyTerm> },
Exact(ExactTerm),
Range { range: HashMap<String, Ranges> },
}
#[derive(Deserialize, Debug, PartialEq)]
#[serde(untagged)]
enum Query {
Boolean { bool: Bool },
}
#[derive(Deserialize, Debug)]
pub struct Request {
query: Option<Query>,
}
let j3 = r#"{"query":{"bool":{"must":[{"term":{"user":"kimchy"}}],"filter":[{"fuzzy":{"user":{"value":"kimchy"}}},{"range":{"age":{"gte":10.5,"lte":-20.3333333}}}],"must_not":[{"term":{"user":"kimchy"}},{"range":{"age":{"gte":10,"lte":20}}}],"should":[{"term":{"user":"kimchy"}},{"range":{"age":{"gte":10,"lte":20}}}],"minimum_should_match":1,"boost":1.0}}}"#;
let result: Request = serde_json::from_str(j3).unwrap();
println!("{:#?}", result);
}
Dependabot can't resolve your Rust dependency files.
As a result, Dependabot couldn't update your dependencies.
The error Dependabot encountered was:
Updating git repository `https://github.com/tower-rs/tower`
Updating crates.io index
error: no matching package named `tower-direct-service` found
location searched: https://github.com/tower-rs/tower
required by package `toshi v0.1.1 (/home/dependabot/dependabot-updater/dependabot_tmp_dir
If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.
You can mention @dependabot in the comments below to contact the Dependabot team.
Related to #39
A mime type containing the ;charset=
qualifier caused Toshi to respond with 400 Bad Request
to an index creation request
The index would have been created. Many, many http libraries will send along the charset if they know it, because it's polite
Assuming you have run cargo test
, there will be new_index
containing the tantivy index, and its schema, in the current directory; thus:
$ jq .schema new_index/meta.json | \
curl --compress -vH 'content-type: application/json;charset=utf-8' -X PUT -d @- 127.0.0.1:9200/foo/_create
produces:
> PUT /foo/_create HTTP/1.1
> Host: 127.0.0.1:9200
> User-Agent: curl/7.54.0
> Accept: */*
> Accept-Encoding: deflate, gzip
> content-type: application/json;charset=utf-8
>
< HTTP/1.1 400 Bad Request
< content-type: application/json
< content-encoding: deflate
< transfer-encoding: chunked
< date: Sun, 20 Jan 2019 20:58:04 GMT
<
{"message":"Bad request","uri":"/foo/_create"}
but the unqualified content-type
header creates the index as expected
$ jq .schema new_index/meta.json | \
curl --compress -vH 'content-type: application/json' -X PUT -d @- 127.0.0.1:9200/foo/_create
SIGTERM, SIGINT
should also be captured to trigger a graceful shutdown.
After issuing the command curl -X GET http://localhost:8080 -output -
the following response is returned by toshi:
){"name":"Toshi Search","version":"0.1.1"}
As you can see, this is invalid json, there's an aditional character at the start of the response and it vary between: -
, +
, ,
, )
, etc. I don't know why it's returning this, seems like garbage to me. Removing the Deflate middleware somehow solves the problem. If you try to change the name
in ToshiInfo the initial byte changes too, pretty strange.
Related to #39
Dependabot can't resolve your Rust dependency files.
As a result, Dependabot couldn't update your dependencies.
The error Dependabot encountered was:
Updating git repository `https://github.com/tower-rs/tower-grpc`
Updating crates.io index
Updating git repository `https://github.com/carllerche/tokio-connect`
Updating git repository `https://github.com/tower-rs/tower`
Updating git repository `https://github.com/tower-rs/tower-h2`
Updating git repository `https://github.com/tower-rs/tower-http`
error: no matching package named `tower-direct-service` found
location searched: https://github.com/tower-rs/tower
required by package `toshi v0.1.1 (/home/dependabot/dependabot-updater/dependabot_tmp_dir
If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.
You can mention @dependabot in the comments below to contact the Dependabot team.
It would be great to be able to
cargo install toshi
This is more to open a dicussion on the possiblity to integrate bors.
I use this on my other projects mainly, amethyst/amethyst and amethyst/laminar but it may be useful for this project as well.
After successfull build of release target (Ubuntu), while running
$ RUST_BACKTRACE=1 ./target/release/toshi
I keep on getting:
INFO toshi > Settings { host: "127.0.0.1", port: 8080, path: "data/", place_addr: "0.0.0.0:8082", log_level: "info", writer_memory: 200000000, json_parsing_threads: 4, auto_commit_duration: 10, bulk_buffer_size: 10000, merge_policy: ConfigMergePolicy { kind: "log", min_merge_size: Some(8), min_layer_size: Some(10000), level_log_size: Some(0.75) }, consul_addr: "127.0.0.1:8500", cluster_name: "kitsune", enable_clustering: true, master: true, nodes: ["127.0.0.1:8081", "127.0.0.1:8082"] }
______ __ _ ____ __
/_ __/__ ___ / / (_) / __/__ ___ _________/ /
/ / / _ \(_-</ _ \/ / _\ \/ -_) _ `/ __/ __/ _ \
/_/ \___/___/_//_/_/ /___/\__/\_,_/_/ \__/_//_/
Such Relevance, Much Index, Many Search, Wow
ERROR toshi > Error: Failed registering Node: Inner(Inner(Error { kind: Connect, cause: Os { code: 111, kind: ConnectionRefused, message: "Connection refused" } }))
thread 'main' panicked at 'internal error: entered unreachable code: Shutdown signal channel should not error, This is a bug.', src/bin/toshi.rs:68:22
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
1: std::panicking::default_hook::{{closure}}
at src/libstd/sys_common/backtrace.rs:71
at src/libstd/sys_common/backtrace.rs:59
at src/libstd/panicking.rs:211
2: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:227
at src/libstd/panicking.rs:491
3: std::panicking::continue_panic_fmt
at src/libstd/panicking.rs:398
4: std::panicking::begin_panic_fmt
at src/libstd/panicking.rs:353
5: toshi::main::{{closure}}
6: <futures::task_impl::Spawn<T>>::enter::{{closure}}
7: toshi::main
8: std::rt::lang_start::{{closure}}
9: main
10: __libc_start_main
11: _start
netstat shows me that 8080 isn't in use by another process, and running command with sudo doesn't change anything. Message clearly state that this is a bug. So... is there a solution or not?
Replace the crossbean channel used by the new tokio-sync crate
This causes Toshi to fail to start. @hntd187 should we create it if it doesn't exist?
There should be a function that creates a Shard (either primary or replica), the underlying tantivy index, and registers it in Consul
Related to #39.
To start with:
These should be reported every N minutes, and could also serve as a heartbeat
I wanted to add the tantivy-cli into the project binaries as a tool for admins to work with indexes on disk in an easier and more direct way than just through the rest interface. Since tantivy indexes are units by themselves this should be fine.
https://github.com/tantivy-search/tantivy-cli
The CLI doesn't have a license in it @fulmicoton safe to assume it's MIT like tantivy?
So using consul's kv store we can do more in terms of orchestration than just leader election (if we even want to go that route)
but basically
/service/toshi/leader true
whichever node gets the session lock on this is the leader then nodes can join the cluster and announce their address perhaps via something like
/service/toshi/cluster/tn1 hostname:port
/service/toshi/cluster/tn2 hostname:port
So on so forth, I don't have any particular attachment to this method, but just something to get started with.
This relates to issue #14.
This describes the basics of ES sharding.
In ES, an index is created with a certain number of Primary shards (5 is common). These Primary shards are writable, and when data is written to an index, a Primary shard is chosen to receive it.
The number of primary shards for an index cannot be changed after creation.
Each Primary shard has 0 or more Replica shards. After a Primary shard writes data, it is replicated to the Replica shards belonging to that Primary shard. Replica shards can be used to handle read requests, but not write. Ideally, all Replica shards are located on a different node/server.
This means redistributing data amongst the cluster members. It is usually done in two situations.
If more nodes are added to a cluster, shards should be rebalanced to evenly distribute the load. This can be done in many ways. With consistent hashing, the algorithm itself will tell you what needs to be moved where.
With a leader architecture, a process will need to choose which shards to move where. This can be based on CPU, user-defined tags, memory usage or any other metadata we have about the state of the system.
It is common for a situation to arise where one node is overloaded because it holds shards for popular indices. This may be a temporary situation, or a longer-term issue. In this scenario, it is desireable to redistribute the hot shards to machines with less load.
I suggest using Consul to track shard assignments along with leader election.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.