Git Product home page Git Product logo

blast's Introduction

This project has been taken over by Phalanx.

This project has not been maintained for a long time.

Blast

Blast is a full-text search and indexing server written in Go built on top of Bleve.
It provides functions through gRPC (HTTP/2 + Protocol Buffers) or traditional RESTful API (HTTP/1.1 + JSON).
Blast implements a Raft consensus algorithm by hashicorp/raft. It achieves consensus across all the nodes, ensuring that every change made to the system is made to a quorum of nodes, or none at all. Blast makes it easy for programmers to develop search applications with advanced features.

Features

  • Full-text search/indexing
  • Faceted search
  • Spatial/Geospatial search
  • Search result highlighting
  • Index replication
  • Bringing up cluster
  • An easy-to-use HTTP API
  • CLI is available
  • Docker container image is available

Install build dependencies

Blast requires some C/C++ libraries if you need to enable cld2, icu, libstemmer or leveldb. The following sections are instructions for satisfying dependencies on particular platforms.

Ubuntu 18.10

$ sudo apt-get update
$ sudo apt-get install -y \
    libicu-dev \
    libstemmer-dev \
    libleveldb-dev \
    gcc-4.8 \
    g++-4.8 \
    build-essential

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 80
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 90
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 90

$ export GOPATH=${HOME}/go
$ mkdir -p ${GOPATH}/src/github.com/blevesearch
$ cd ${GOPATH}/src/github.com/blevesearch
$ git clone https://github.com/blevesearch/cld2.git
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

macOS High Sierra Version 10.13.6

$ brew install \
    icu4c \
    leveldb

$ export GOPATH=${HOME}/go
$ go get -u -v github.com/blevesearch/cld2
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ perl -p -i -e 's/soname=/install_name,/' compile_libs.sh
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

Build

Building Blast as following:

$ mkdir -p ${GOPATH}/src/github.com/mosuka
$ cd ${GOPATH}/src/github.com/mosuka
$ git clone https://github.com/mosuka/blast.git
$ cd blast
$ make

If you omit GOOS or GOARCH, it will build the binary of the platform you are using.
If you want to specify the target platform, please set GOOS and GOARCH environment variables.

Linux

$ make GOOS=linux build

macOS

$ make GOOS=darwin build

Windows

$ make GOOS=windows build

Build with extensions

Blast supports some Bleve Extensions (blevex). If you want to build with them, please set CGO_LDFLAGS, CGO_CFLAGS, CGO_ENABLED and BUILD_TAGS. For example, build LevelDB to be available for index storage as follows:

$ make GOOS=linux \
       BUILD_TAGS=icu \
       CGO_ENABLED=1 \
       build

Linux

$ make GOOS=linux \
       BUILD_TAGS="kagome icu libstemmer cld2" \
       CGO_ENABLED=1 \
       build

macOS

$ make GOOS=darwin \
       BUILD_TAGS="kagome icu libstemmer cld2" \
       CGO_ENABLED=1 \
       CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib" \
       CGO_CFLAGS="-I/usr/local/opt/icu4c/include" \
       build

Build flags

Refer to the following table for the build flags of the supported Bleve extensions:

BUILD_TAGS CGO_ENABLED Description
cld2 1 Enable Compact Language Detector
kagome 0 Enable Japanese Language Analyser
icu 1 Enable ICU Tokenizer, Thai Language Analyser
libstemmer 1 Enable Language Stemmer (Danish, German, English, Spanish, Finnish, French, Hungarian, Italian, Dutch, Norwegian, Portuguese, Romanian, Russian, Swedish, Turkish)

If you want to enable the feature whose CGO_ENABLE is 1, please install it referring to the Install build dependencies section above.

Binary

You can see the binary file when build successful like so:

$ ls ./bin
blast

Test

If you want to test your changes, run command like following:

$ make test

If you want to specify the target platform, set GOOS and GOARCH environment variables in the same way as the build.

Package

To create a distribution package, run the following command:

$ make dist

Configure

Blast can change its startup options with configuration files, environment variables, and command line arguments.
Refer to the following table for the options that can be configured.

CLI Flag Environment variable Configuration File Description
--config-file - - config file. if omitted, blast.yaml in /etc and home directory will be searched
--id BLAST_ID id node ID
--raft-address BLAST_RAFT_ADDRESS raft_address Raft server listen address
--grpc-address BLAST_GRPC_ADDRESS grpc_address gRPC server listen address
--http-address BLAST_HTTP_ADDRESS http_address HTTP server listen address
--data-directory BLAST_DATA_DIRECTORY data_directory data directory which store the index and Raft logs
--mapping-file BLAST_MAPPING_FILE mapping_file path to the index mapping file
--peer-grpc-address BLAST_PEER_GRPC_ADDRESS peer_grpc_address listen address of the existing gRPC server in the joining cluster
--certificate-file BLAST_CERTIFICATE_FILE certificate_file path to the client server TLS certificate file
--key-file BLAST_KEY_FILE key_file path to the client server TLS key file
--common-name BLAST_COMMON_NAME common_name certificate common name
--cors-allowed-methods BLAST_CORS_ALLOWED_METHODS cors_allowed_methods CORS allowed methods (ex: GET,PUT,DELETE,POST)
--cors-allowed-origins BLAST_CORS_ALLOWED_ORIGINS cors_allowed_origins CORS allowed origins (ex: http://localhost:8080,http://localhost:80)
--cors-allowed-headers BLAST_CORS_ALLOWED_HEADERS cors_allowed_headers CORS allowed headers (ex: content-type,x-some-key)
--log-level BLAST_LOG_LEVEL log_level log level
--log-file BLAST_LOG_FILE log_file log file
--log-max-size BLAST_LOG_MAX_SIZE log_max_size max size of a log file in megabytes
--log-max-backups BLAST_LOG_MAX_BACKUPS log_max_backups max backup count of log files
--log-max-age BLAST_LOG_MAX_AGE log_max_age max age of a log file in days
--log-compress BLAST_LOG_COMPRESS log_compress compress a log file

Start

Starting server is easy as follows:

$ ./bin/blast start \
              --id=node1 \
              --raft-address=:7000 \
              --http-address=:8000 \
              --grpc-address=:9000 \
              --data-directory=/tmp/blast/node1 \
              --mapping-file=./examples/example_mapping.json

You can get the node information with the following command:

$ ./bin/blast node | jq .

or the following URL:

$ curl -X GET http://localhost:8000/v1/node | jq .

The result of the above command is:

{
  "node": {
    "raft_address": ":7000",
    "metadata": {
      "grpc_address": ":9000",
      "http_address": ":8000"
    },
    "state": "Leader"
  }
}

Health check

You can check the health status of the node.

$ ./bin/blast healthcheck | jq .

Also provides the following REST APIs

Liveness prove

This endpoint always returns 200 and should be used to check server health.

$ curl -X GET http://localhost:8000/v1/liveness_check | jq .

Readiness probe

This endpoint returns 200 when server is ready to serve traffic (i.e. respond to queries).

$ curl -X GET http://localhost:8000/v1/readiness_check | jq .

Put a document

To put a document, execute the following command:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents/1' --data-binary '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

or

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents/1' -H "Content-Type: application/json" --data-binary @./examples/example_doc_1.json

Get a document

To get a document, execute the following command:

$ ./bin/blast get 1 | jq .

or, you can use the RESTful API as follows:

$ curl -X GET 'http://127.0.0.1:8000/v1/documents/1' | jq .

You can see the result. The result of the above command is:

{
  "fields": {
    "_type": "example",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "title": "Search engine (computing)"
  }
}

Search documents

To search documents, execute the following command:

$ ./bin/blast search '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

or, you can use the RESTful API as follows:

$ curl -X POST 'http://127.0.0.1:8000/v1/search' --data-binary '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

You can see the result. The result of the above command is:

{
  "search_result": {
    "facets": null,
    "hits": [
      {
        "fields": {
          "_type": "example",
          "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
          "timestamp": "2018-07-04T05:41:00Z",
          "title": "Search engine (computing)"
        },
        "id": "1",
        "index": "/tmp/blast/node1/index",
        "score": 0.09703538256409851,
        "sort": [
          "_score"
        ]
      }
    ],
    "max_score": 0.09703538256409851,
    "request": {
      "explain": false,
      "facets": null,
      "fields": [
        "*"
      ],
      "from": 0,
      "highlight": null,
      "includeLocations": false,
      "query": {
        "query": "+_all:search"
      },
      "search_after": null,
      "search_before": null,
      "size": 10,
      "sort": [
        "-_score"
      ]
    },
    "status": {
      "failed": 0,
      "successful": 1,
      "total": 1
    },
    "took": 171880,
    "total_hits": 1
  }
}

Delete a document

Deleting a document, execute the following command:

$ ./bin/blast delete 1

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/documents/1'

Index documents in bulk

To index documents in bulk, execute the following command:

$ ./bin/blast bulk-index --file ./examples/example_bulk_index.json

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents' -H "Content-Type: application/x-ndjson" --data-binary @./examples/example_bulk_index.json

Delete documents in bulk

To delete documents in bulk, execute the following command:

$ ./bin/blast bulk-delete --file ./examples/example_bulk_delete.txt

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/documents' -H "Content-Type: text/plain" --data-binary @./examples/example_bulk_delete.txt

Bringing up a cluster

Blast is easy to bring up the cluster. the node is already running, but that is not fault tolerant. If you need to increase the fault tolerance, bring up 2 more data nodes like so:

$ ./bin/blast start \
              --id=node2 \
              --raft-address=:7001 \
              --http-address=:8001 \
              --grpc-address=:9001 \
              --peer-grpc-address=:9000 \
              --data-directory=/tmp/blast/node2 \
              --mapping-file=./examples/example_mapping.json
$ ./bin/blast start \
              --id=node3 \
              --raft-address=:7002 \
              --http-address=:8002 \
              --grpc-address=:9002 \
              --peer-grpc-address=:9000 \
              --data-directory=/tmp/blast/node3 \
              --mapping-file=./examples/example_mapping.json

Above example shows each Blast node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.

This instructs each new node to join an existing node, each node recognizes the joining clusters when started. So you have a 3-node cluster. That way you can tolerate the failure of 1 node. You can check the cluster with the following command:

$ ./bin/blast cluster | jq .

or, you can use the RESTful API as follows:

$ curl -X GET 'http://127.0.0.1:8000/v1/cluster' | jq .

You can see the result in JSON format. The result of the above command is:

{
  "cluster": {
    "nodes": {
      "node1": {
        "raft_address": ":7000",
        "metadata": {
          "grpc_address": ":9000",
          "http_address": ":8000"
        },
        "state": "Leader"
      },
      "node2": {
        "raft_address": ":7001",
        "metadata": {
          "grpc_address": ":9001",
          "http_address": ":8001"
        },
        "state": "Follower"
      },
      "node3": {
        "raft_address": ":7002",
        "metadata": {
          "grpc_address": ":9002",
          "http_address": ":8002"
        },
        "state": "Follower"
      }
    },
    "leader": "node1"
  }
}

Recommend 3 or more odd number of nodes in the cluster. In failure scenarios, data loss is inevitable, so avoid deploying single nodes.

The above example, the node joins to the cluster at startup, but you can also join the node that already started on standalone mode to the cluster later, as follows:

$ ./bin/blast join --grpc-address=:9000 node2 127.0.0.1:9001

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/cluster/node2' --data-binary '
{
  "raft_address": ":7001",
  "metadata": {
    "grpc_address": ":9001",
    "http_address": ":8001"
  }
}
'

To remove a node from the cluster, execute the following command:

$ ./bin/blast leave --grpc-address=:9000 node2

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/cluster/node2'

The following command indexes documents to any node in the cluster:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' --grpc-address=:9000 | jq .

So, you can get the document from the node specified by the above command as follows:

$ ./bin/blast get 1 --grpc-address=:9000 | jq .

You can see the result. The result of the above command is:

value1

You can also get the same document from other nodes in the cluster as follows:

$ ./bin/blast get 1 --grpc-address=:9001 | jq .
$ ./bin/blast get 1 --grpc-address=:9002 | jq .

You can see the result. The result of the above command is:

{
  "fields": {
    "_type": "example",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "title": "Search engine (computing)"
  }
}

Docker

Build Docker container image

You can build the Docker container image like so:

$ make docker-build

Pull Docker container image from docker.io

You can also use the Docker container image already registered in docker.io like so:

$ docker pull mosuka/blast:latest

See https://hub.docker.com/r/mosuka/blast/tags/

Start on Docker

Running a Blast data node on Docker. Start Blast node like so:

$ docker run --rm --name blast-node1 \
    -p 7000:7000 \
    -p 8000:8000 \
    -p 9000:9000 \
    -v $(pwd)/etc/blast_mapping.json:/etc/blast_mapping.json \
    mosuka/blast:latest start \
      --id=node1 \
      --raft-address=:7000 \
      --http-address=:8000 \
      --grpc-address=:9000 \
      --data-directory=/tmp/blast/node1 \
      --mapping-file=/etc/blast_mapping.json

You can execute the command in docker container as follows:

$ docker exec -it blast-node1 blast node --grpc-address=:9000

Securing Blast

Blast supports HTTPS access, ensuring that all communication between clients and a cluster is encrypted.

Generating a certificate and private key

One way to generate the necessary resources is via openssl. For example:

$ openssl req -x509 -nodes -newkey rsa:4096 -keyout ./etc/blast_key.pem -out ./etc/blast_cert.pem -days 365 -subj '/CN=localhost'
Generating a 4096 bit RSA private key
............................++
........++
writing new private key to 'key.pem'

Secure cluster example

Starting a node with HTTPS enabled, node-to-node encryption, and with the above configuration file. It is assumed the HTTPS X.509 certificate and key are at the paths server.crt and key.pem respectively.

$ ./bin/blast start \
             --id=node1 \
             --raft-address=:7000 \
             --http-address=:8000 \
             --grpc-address=:9000 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node1 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost
$ ./bin/blast start \
             --id=node2 \
             --raft-address=:7001 \
             --http-address=:8001 \
             --grpc-address=:9001 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node2 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost
$ ./bin/blast start \
             --id=node3 \
             --raft-address=:7002 \
             --http-address=:8002 \
             --grpc-address=:9002 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node3 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost

You can access the cluster by adding a flag, such as the following command:

$ ./bin/blast cluster --grpc-address=:9000 --certificate-file=./etc/blast_cert.pem --common-name=localhost | jq .

or

$ curl -X GET https://localhost:8000/v1/cluster --cacert ./etc/cert.pem | jq .

blast's People

Contributors

mosuka avatar pablocastellano avatar radutopala avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blast's Issues

Error getting document by ID over rest API

I get an error when issuing a very simple query over rest endpoint:

flaviostutz-Mac:fess flaviostutz$ curl -vv --location --request GET 'localhost:6000/v1/documents/a2b'
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 6000 (#0)
> GET /v1/documents/a2b HTTP/1.1
> Host: localhost:6000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json
< Grpc-Metadata-Content-Type: application/grpc
< Grpc-Metadata-Content-Type: application/grpc
< Date: Fri, 17 Jan 2020 21:27:59 GMT
< Content-Length: 130
<
* Connection #0 to host localhost left intact
{"error":"unknown message type \"map[string]interface {}\"","code":2,"message":"unknown message type \"map[string]interface {}\""}* Closing connection 0

Seems like a Golang cast issue. I could not find the location this is handled on code. Anyone could help?

Indexing content / faceted search

Hey,

Hope you are all well !

I wanted to extend the following golang project, https://github.com/hoop33/limo, using also bleve, with blast. Actually, I forked and updated limo in order to automatically fetch and add the repo topics in order to pre-fill limo's tags index.

But I would like to create a web-ui and create some facets to explore my starred repository. So after a couple of searches, I found your project wrapping the bleve package.

Can you provide an example or more explanation about how to index content ?

Questions:

Thanks in advance.

Cheers,
Richard

Possible to build a Blind index using blast?

We have security requirements that require the use of encryption at rest. We would like to be able to build an index of the unencrypted plain text, and then ship the index and the encrypted data files, without leaking the plain text. Is this something that is possible with blast/bleve?

Search with prefix

Hi,

I was wondering if I’m doing anything wrong, but other than doing a match query I can’t seem to perform a prefix query at all. According to the bleve documentation I should be able to do that by using “prefix”: “searchterm”. However it doesn’t yield any results. Do you have an example search query for prefix?

Thanks!

More Query String Query Examples

Now, some query examples(simple and prefix) are given, but it seems to be not enough to get quick start for more complicated query logic.

Query String Query supports all kinds of query such as Phrases, Field Scoping, Boolean Query, Numeric Ranges, fuzzy search and so on.

If more examples are given, it seems to be more friendly to tiro.

How to search documents with id prefix?

I have two kind of documents, A and B, I index them in bulk with different document id prefix. basically like:

{"fields":{"code":"xxx","name":"xxx"},"id":"path/A/10"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/A/11"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/B/20"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/B/21"}

Now I only want to search documents with id prefix "path/A/". How to do this?

An usable web client for blast just like kibana for elasticsearch?

For using blast easily, a web client for accessing blast seems to be a good way.

Is there an an usable web client for blast just like kibana for elasticsearch?

Or, is there an open source project to build a web client quickly for querying?

And I have created a simple go-web for querying term and want it to support more query method like numeric range, prefix, date and so on.

May you have some suggestion? thanks.

REST structure

So I want to load into Blast using my own go RESTful client code.

In your example

$ cat ./example/document_1.json | xargs -0 ./bin/blast put document --request

I see we are doing a PUT to 0.0.0.0:8000/rest/ID where ID is the ID of the document.

However, how is the JSON document sent in that PUT command? Is this a BODY only or is there an association that needs to take place. I have used the HTTPie package with

http -v -j PUT 0.0.0.0:8000/rest/1 @sodataset.json

but it comes back 400 bad request. If we are using HTTPie or curl what would a PUT command look like?

I'm trying to load 50K JSON-LD documents (schema.org) from Minio into Blast.

Thanks!
Doug

Feature idea: KVS pub sub

The KVS is a generic bucket store.
I was thinking of extending it to publish changes over GRPC.

Use Case ?
When you build Clients using this system, its very useful to know when anything has change and he nature of the change. Then as a subscriber i can update my many microservices or even gui client using this. It keeps them all in sync basically.

The event would be like:

  • namespace: the bucket namespace
  • eventtype: Create/Read/Update/Delete
  • data: protobuf or json.

Also should have the ability to turn off eventtype Read because no one normally needs that, but it can be useful for dynamically knowing who is reading where.

Implementation:
GRPC Middleware might be the perfect fit !
https://github.com/grpc-ecosystem/go-grpc-middleware
Also great for adding other things like security etc.

Because its GRPC, it will be easy to then recieve it and put it onto NATS message broker later also as a 2nd bit of work.

Index.
I thought also about the index but thinks its not worth the effort. What you can do is make each index query output to a cache into the KVS store. then you can use PUB SUB from that. Almost like a Materialized View with pub sub on top of it. And also gives you caching for free to a degree.

GEO searches

Is it possible to use blast for Geo localized searches ? It is nowadays an important feature in many projects.
If yes would it be possible to include an example at one point ? ( mapping, and search request)

I have put a very basic example of Geo search with bleve there : https://github.com/hubyhuby/bleve-search-example/blob/444d18d810064302fef76693f86f07c533552897/main.go#L138

There are two main search I see :
*Geo box search around a point
*and Geo Sort search by distance to a point.

I understand it is a WIP, but I am really looking forward to this project.
Thanks for this nice project !

Possible to run as embedded service?

Is it possible to run blast as an embedded service into an existing application cluster? That is, if there is already a service running 2+ instances and it wants to add indexing, could it run a blast cluster using the existing service nodes? I'm thinking of this like the way Nats.io server can be embedded instead of having to run standalone gnatsd processes.

segfault when port 8080 is already bound

Hello,

Thanks again for your last bugfix. Here is another issue:

./bin/blast-index start --node-id=index1 --data-dir=/tmp/blast/index1 --bind-addr=:6060 --grpc-addr=:5050 --http-addr=:8080 --index-mapping-file ./example/index_mapping.json
2019/03/14 16:26:27.068886 github.com/mosuka/blast/index/server.go:131 [ERR] listen tcp :8080: bind: address already in use
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xb51108]

goroutine 39 [running]:
github.com/mosuka/blast/index.(*Server).Start(0x0)
/home/rbastic/go-src/src/github.com/mosuka/blast/index/server.go:147 +0x68
created by main.execStart
/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blast-index/start.go:78 +0x802

docker no http beyond root

I get 404 for any query except /

curl -X GET http://localhost:10002/
{
  "status": 200,
  "version": "v0.8.1"
}

failure

$ curl -X GET http://localhost:10002/v1/liveness_check 
404 page not found
  blast:
     image: mosuka/blast:v0.8.1
     ports:
       - 10000:10000
       - 10001:10001
       - 10002:10002
     ulimits:
       nofile:
         soft: "65536"
         hard: "65536"
#     env_file:
#       - ./example.env
     environment:
       - SERVICE_PORTS=10000, 10001, 10002
     volumes:
       - "/data/volumes/blast:/data"
#     networks:
#       - web
# 0.3.0
#     command: ["start", "--bind-addr=:10000", "--grpc-addr=:10001", "--http-addr=:10002", "--node-id=node1"]
# v0.8.1
     command: ["blast", "indexer","start","--data-dir=/data", "--node-address=:10000", "--grpc-address=:10001", "--http-address=:10002", "--node-id=node1"]

mac build not up to date

Doing the full mac build, i think it yells because the framework files were updated on the latest MacOS.
I had exactly the same problem using opengl from golang also.

It does build and does run.

The source of the bug is here: golang/go#26073

x-MacBook-Pro:blast apple$ make build
## mac with all ext
cd /Users/apple/workspace/go/src/github.com/mosuka/blast &&  GOOS=darwin \
    CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib -L/usr/local/opt/rocksdb/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd" \
    CGO_CFLAGS="-I/usr/local/opt/icu4c/include -I/usr/local/opt/rocksdb/include" \
    CGO_ENABLED=1 \
    BUILD_TAGS="full" \
    make build
>> building binaries
   VERSION     = 0.4.0
   GOOS        = darwin
   GOARCH      = amd64
   CGO_ENABLED = 1
   CGO_CFLAGS  = -I/usr/local/opt/icu4c/include -I/usr/local/opt/rocksdb/include
   CGO_LDFLAGS = -L/usr/local/opt/icu4c/lib -L/usr/local/opt/rocksdb/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd
   BUILD_TAGS  = full
./cmd/blast
# crypto/x509
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Security.framework/Security.tbd and library file /System/Library/Frameworks//Security.framework/Security are out of sync. Falling back to library file for linking.
./cmd/blastd
# github.com/mosuka/blast/cmd/blastd
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Security.framework/Security.tbd and library file /System/Library/Frameworks//Security.framework/Security are out of sync. Falling back to library file for linking.

Management of federated cluster

What's the suggested plan for management of the federated cluster ?

Try to be DRY and use the standard tracing, logging, metrics solutions out there in golang land.
This enables a dev to bring blast up and have the insight without learning new tools.

Am wondering if we should build a web based dashboard to provide the ability for a developer ( rather than a SRE ) to play with blast but also to have a single management pane seperate from the normal SRE stuff

basic example

Any chance you can make a golang example that uses the GRPC API and the test data in the "examples" directory ?

It would make it much easier to get going and helping to fix things too.

re-index / update an individual document

Sorry if this had been addressed but I could not find anywhere. It is not an issue but rather a feature. Saying I have a collection of documents in MongoDB and indexed using blast. Now I update an individual document in MongoDB database, how I can reindex this document with blast?

Is there anyway that I can retrieve indexed Id of this document so that I can delete and recreate new index? Or Are there any better solution?

segfault on ubuntu

Hello,

I ran a fresh build off the latest master and then a segfault happened when I tried to startup blast.

rbastic@asgard:~/go-src/src/github.com/mosuka/blast$ ./bin/blastd data \
>     --raft-addr=127.0.0.1:10000 \
>     --grpc-addr=127.0.0.1:10001 \
>     --http-addr=127.0.0.1:10002 \
>     --raft-node-id=node1 \
>     --raft-dir=/tmp/blast/node1/raft \
>     --store-dir=/tmp/blast/node1/store \
>     --index-dir=/tmp/blast/node1/index \
>     --index-mapping-file=./etc/index_mapping.json

    ____   __              __ 
   / __ ) / /____ _ _____ / /_
  / __ \ / // __ '// ___// __/  The lightweight distributed
 / /_/ // // /_/ /(__  )/ /_    indexing and search server.
/_.___//_/ \__,_//____/ \__/    version 0.4.0

2019/03/14 02:04:44.592113 github.com/mosuka/blast/node/data/service/service.go:104 [ERR] no analyzer with name or type 'ja' registered
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x9b2706]

goroutine 1 [running]:
github.com/mosuka/blast/index.(*Index).Close(0x0, 0xd71c18, 0xc0001f96c8)
	/home/rbastic/go-src/src/github.com/mosuka/blast/index/index.go:175 +0x26
github.com/mosuka/blast/node/data/service.(*Service).Stop(0xc0000a70e0, 0x1, 0xc0001c1640)
	/home/rbastic/go-src/src/github.com/mosuka/blast/node/data/service/service.go:166 +0x33
main.data(0xc0000b14a0, 0xe56c20, 0xc0003a01c0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blastd/data.go:135 +0x1559
github.com/mosuka/blast/vendor/github.com/urfave/cli.HandleAction(0xc011c0, 0xd73868, 0xc0000b14a0, 0xc0000a6f00, 0x0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/app.go:490 +0xc8
github.com/mosuka/blast/vendor/github.com/urfave/cli.Command.Run(0xd3d58b, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0xd47611, 0x11, 0x0, ...)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/command.go:210 +0x92f
github.com/mosuka/blast/vendor/github.com/urfave/cli.(*App).Run(0xc0000adba0, 0xc0000aa000, 0xa, 0xa, 0x0, 0x0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/app.go:255 +0x69b
main.main()
	/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blastd/main.go:51 +0x2b2

Looks like a very exciting project, curious to know if I might be building it wrong somehow (Ubuntu 18.10)

Failing docker containers with "No help topic for 'blastd'"

summary

The docker container seems to be broken. Instead of starting up properly they fail and exit.

steps to reproduce

docker pull mosuka/blast:latest
docker-compose up

expected result

A cluster of blast should be running and available over the network.

actual result

The terminal output is No help topic for 'blastd'for every node and the nodes are restarted. They exit with blast_blast1_1 exited with code 3.

Make the API easier to work with

Have you used https://github.com/googleapis/gnostic ?

It would make lots of the code you wrote redundant and make it much easier to extend.
Also much less bugs.

To explain.
From GPRC it can gen all of your openAPI based REST !
OpenAPI is the main way to do REST. Swagger is dead.

It can also do the opposite amazingly. OPENAPI to GRPC.
But i think using GRPC as the source of truth is best for Blast.

Have a think about it.
I can help if your interested...

guidance on generating index mapping?

Thanks for the previous help... I've come back to playing with this and have a question or two if you have time.

  1. I've generated a simple Dockerfile and docker-ized blast https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast

  2. I've not included any of the config files in this container.. though I don't know yet what is default in the Go code... though I did go through it. I've using Bleve as well in my code at https://github.com/earthcubearchitecture-project418/gleaner but not with the sophistication you are. :)

  3. I've been playing with loading schema.org JSON-LD (type Dataset) into Blast and trying to search (docs at https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast/examples ) where sodataset.json is a JSON-LD doc wrapper with the

{
    "id": "1",
    "fields": {

I think they need ??

My question is this:

If one wanted to leverage Blast for other JSON documents, what are the basic steps needed?
I was curious why my test failed since I thought that Bleve instance in Blast would simply use the

indexMapping := bleve.NewIndexMapping()

as default an give me simple default index of the JSON structure. My plan was to build out a more focused mapping from there. However, that doesn't even seem to work since when I load the document and search for exact matches of known words in the document I get nothing. I am wondering if it is trying to force my document into a mapping that it does not fit, resulting in no search results.

In the process

cat example/sodataset.json | xargs -0 ./bin/blast put document --request

./bin/blast get document --id 1

cat search_requestv2.json | xargs -0 ~/src/git/blast/bin/blast search --request > ../searchoutput.json

The first two work fine, I can load and retrieve the document. I am not able to structure a valid search with either of the search request documents inside https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast/examples

Any guidance appreciated!

Thanks
Doug

key issue

trying to get started with blast...

tried the command and got
"key path not found"

fils@xps:~/src/git/blast/bin$ cat ../doc1.json  | xargs -0 ./blast put document --request
Error: Key path not found
Usage:
  blast put document [flags]

Flags:
      --grpc-server-address string   Blast server to connect to using gRPC (default "0.0.0.0:5000")
      --dial-timeout int             dial timeout (default 5000)
      --request-timeout int          request timeout (default 5000)
      --id string                    document id
      --fields string                document fields
      --request string               request file
  -h, --help                         help for document

Global Flags:
      --output-format string   output format (default "json")
  -v, --version                show version number

with document

{
	"document": {
		"id": "1",
		"fields": {
			"name": "Bleve",
			"description": "Bleve is a full-text search and indexing library for Go.",
			"category": "Library",
			"popularity": 3.0,
			"release": "2014-04-18T00:00:00Z",
			"type": "document"
		}
	}
}

any ideas what I am doing wrong?

Custom header?

Hi,

I need to add a custom header for environment selection behind a load balancer. I noticed that I can’t append to the outgoing context of the grpc client, since it’s private.

I am using tag 0.7.1. Is adding custom headers implemented in higher versions?

Thanks

ignore fields for index

Hi,

in an attempt to reduce the index size, I preprocessed the data. However, when I will use the API, I want to get the human readable data back so I can show it to the user. Is there a way to exclude fields when building an index? I tried to use "x" for the preprocessed field and "_x" for the original text. Unfortunately, this increased the index by a lot so I believe the field starting with "_" was not excluded. Is there a way to do this? My only other idea is to build wrapper API which stores the text to an ID in a dictionary and then returns that. But that seems like it should already be supported.

Mongodb example

Looks like this project could be a pretty good alternative of Elastic.
I am considering of this project for an alternative of Elasticsearch which is super heavy memory consuming index search engine. This can be an alternative for a medium-scale or large scale project?
The problem is my main db is Mongodb. Should I extract json from Mongodb periodically and send back to Blast to build indexes?
What is the best option for my situation??
I need an example mongodb connector to communicate with Blast via GRPC to realtime build index like the elastic search doing

One more question, Is it a good idea to interact with Blast server from end-user clients.
My situation is I want to let users do search/filter items in the browser directly. how about grpc-web?(I know grpc-web project is immature) What about HTTP2 + json(Rest) ?

Consensus Protocol Implement method?

I know from Readme, the cluster is built based on Raft consensus algorithm.

But when I try to use the cluster mode, when I kill the leader node, re-election didn't happened.

Does Blast support leader re-election?

And for consensus, The write operation(indexing, PUT) should only happen on Leader node. I use http indexing request to follower node when leader has been killed, it still works well, so I am a little confused. Can write operation work on followers?

If write operation can work on followers, when different write operations happen at the same time to different nodes, consensus and sequence may not be guaranteed

Elasticsearch backward

Is there any chance to have elasticsearch backward.
Major probem with elastic search is system requirement that if blast has same api via elasticsearch many developers will be replace it.
32gb for one node 😢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.