mosuka / blast Goto Github PK

View Code? Open in Web Editor NEW

1.1K 32.0 76.0 34.93 MB

Blast is a full text search and indexing server, written in Go, built on top of Bleve.

License: Apache License 2.0

Makefile 1.79% Go 97.53% Dockerfile 0.68%

golang go search-engine grpc raft docker index search restful-api blast cluster

blast's Introduction

This project has been taken over by Phalanx.

This project has not been maintained for a long time.

Blast

Blast is a full-text search and indexing server written in Go built on top of Bleve.
It provides functions through gRPC (HTTP/2 + Protocol Buffers) or traditional RESTful API (HTTP/1.1 + JSON).
Blast implements a Raft consensus algorithm by hashicorp/raft. It achieves consensus across all the nodes, ensuring that every change made to the system is made to a quorum of nodes, or none at all. Blast makes it easy for programmers to develop search applications with advanced features.

Features

Full-text search/indexing
Faceted search
Spatial/Geospatial search
Search result highlighting
Index replication
Bringing up cluster
An easy-to-use HTTP API
CLI is available
Docker container image is available

Install build dependencies

Blast requires some C/C++ libraries if you need to enable cld2, icu, libstemmer or leveldb. The following sections are instructions for satisfying dependencies on particular platforms.

Ubuntu 18.10

$ sudo apt-get update
$ sudo apt-get install -y \
    libicu-dev \
    libstemmer-dev \
    libleveldb-dev \
    gcc-4.8 \
    g++-4.8 \
    build-essential

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 80
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 90
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 90

$ export GOPATH=${HOME}/go
$ mkdir -p ${GOPATH}/src/github.com/blevesearch
$ cd ${GOPATH}/src/github.com/blevesearch
$ git clone https://github.com/blevesearch/cld2.git
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

macOS High Sierra Version 10.13.6

$ brew install \
    icu4c \
    leveldb

$ export GOPATH=${HOME}/go
$ go get -u -v github.com/blevesearch/cld2
$ cd ${GOPATH}/src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ perl -p -i -e 's/soname=/install_name,/' compile_libs.sh
$ ./compile_libs.sh
$ sudo cp *.so /usr/local/lib

Build

Building Blast as following:

$ mkdir -p ${GOPATH}/src/github.com/mosuka
$ cd ${GOPATH}/src/github.com/mosuka
$ git clone https://github.com/mosuka/blast.git
$ cd blast
$ make

If you omit GOOS or GOARCH, it will build the binary of the platform you are using.
If you want to specify the target platform, please set GOOS and GOARCH environment variables.

Linux

$ make GOOS=linux build

macOS

$ make GOOS=darwin build

Windows

$ make GOOS=windows build

Build with extensions

Blast supports some Bleve Extensions (blevex). If you want to build with them, please set CGO_LDFLAGS, CGO_CFLAGS, CGO_ENABLED and BUILD_TAGS. For example, build LevelDB to be available for index storage as follows:

$ make GOOS=linux \
       BUILD_TAGS=icu \
       CGO_ENABLED=1 \
       build

Linux

$ make GOOS=linux \
       BUILD_TAGS="kagome icu libstemmer cld2" \
       CGO_ENABLED=1 \
       build

macOS

$ make GOOS=darwin \
       BUILD_TAGS="kagome icu libstemmer cld2" \
       CGO_ENABLED=1 \
       CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib" \
       CGO_CFLAGS="-I/usr/local/opt/icu4c/include" \
       build

Build flags

Refer to the following table for the build flags of the supported Bleve extensions:

BUILD_TAGS	CGO_ENABLED	Description
cld2	1	Enable Compact Language Detector
kagome	0	Enable Japanese Language Analyser
icu	1	Enable ICU Tokenizer, Thai Language Analyser
libstemmer	1	Enable Language Stemmer (Danish, German, English, Spanish, Finnish, French, Hungarian, Italian, Dutch, Norwegian, Portuguese, Romanian, Russian, Swedish, Turkish)

If you want to enable the feature whose CGO_ENABLE is 1, please install it referring to the Install build dependencies section above.

Binary

You can see the binary file when build successful like so:

$ ls ./bin
blast

Test

If you want to test your changes, run command like following:

$ make test

If you want to specify the target platform, set GOOS and GOARCH environment variables in the same way as the build.

Package

To create a distribution package, run the following command:

$ make dist

Configure

Blast can change its startup options with configuration files, environment variables, and command line arguments.
Refer to the following table for the options that can be configured.

CLI Flag	Environment variable	Configuration File	Description
--config-file	-	-	config file. if omitted, blast.yaml in /etc and home directory will be searched
--id	BLAST_ID	id	node ID
--raft-address	BLAST_RAFT_ADDRESS	raft_address	Raft server listen address
--grpc-address	BLAST_GRPC_ADDRESS	grpc_address	gRPC server listen address
--http-address	BLAST_HTTP_ADDRESS	http_address	HTTP server listen address
--data-directory	BLAST_DATA_DIRECTORY	data_directory	data directory which store the index and Raft logs
--mapping-file	BLAST_MAPPING_FILE	mapping_file	path to the index mapping file
--peer-grpc-address	BLAST_PEER_GRPC_ADDRESS	peer_grpc_address	listen address of the existing gRPC server in the joining cluster
--certificate-file	BLAST_CERTIFICATE_FILE	certificate_file	path to the client server TLS certificate file
--key-file	BLAST_KEY_FILE	key_file	path to the client server TLS key file
--common-name	BLAST_COMMON_NAME	common_name	certificate common name
--cors-allowed-methods	BLAST_CORS_ALLOWED_METHODS	cors_allowed_methods	CORS allowed methods (ex: GET,PUT,DELETE,POST)
--cors-allowed-origins	BLAST_CORS_ALLOWED_ORIGINS	cors_allowed_origins	CORS allowed origins (ex: http://localhost:8080,http://localhost:80)
--cors-allowed-headers	BLAST_CORS_ALLOWED_HEADERS	cors_allowed_headers	CORS allowed headers (ex: content-type,x-some-key)
--log-level	BLAST_LOG_LEVEL	log_level	log level
--log-file	BLAST_LOG_FILE	log_file	log file
--log-max-size	BLAST_LOG_MAX_SIZE	log_max_size	max size of a log file in megabytes
--log-max-backups	BLAST_LOG_MAX_BACKUPS	log_max_backups	max backup count of log files
--log-max-age	BLAST_LOG_MAX_AGE	log_max_age	max age of a log file in days
--log-compress	BLAST_LOG_COMPRESS	log_compress	compress a log file

Start

Starting server is easy as follows:

$ ./bin/blast start \
              --id=node1 \
              --raft-address=:7000 \
              --http-address=:8000 \
              --grpc-address=:9000 \
              --data-directory=/tmp/blast/node1 \
              --mapping-file=./examples/example_mapping.json

You can get the node information with the following command:

$ ./bin/blast node | jq .

or the following URL:

$ curl -X GET http://localhost:8000/v1/node | jq .

The result of the above command is:

{
  "node": {
    "raft_address": ":7000",
    "metadata": {
      "grpc_address": ":9000",
      "http_address": ":8000"
    },
    "state": "Leader"
  }
}

Health check

You can check the health status of the node.

$ ./bin/blast healthcheck | jq .

Also provides the following REST APIs

Liveness prove

This endpoint always returns 200 and should be used to check server health.

$ curl -X GET http://localhost:8000/v1/liveness_check | jq .

Readiness probe

This endpoint returns 200 when server is ready to serve traffic (i.e. respond to queries).

$ curl -X GET http://localhost:8000/v1/readiness_check | jq .

Put a document

To put a document, execute the following command:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents/1' --data-binary '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents/1' -H "Content-Type: application/json" --data-binary @./examples/example_doc_1.json

Get a document

To get a document, execute the following command:

$ ./bin/blast get 1 | jq .

or, you can use the RESTful API as follows:

$ curl -X GET 'http://127.0.0.1:8000/v1/documents/1' | jq .

You can see the result. The result of the above command is:

{
  "fields": {
    "_type": "example",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "title": "Search engine (computing)"
  }
}

Search documents

To search documents, execute the following command:

$ ./bin/blast search '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

or, you can use the RESTful API as follows:

$ curl -X POST 'http://127.0.0.1:8000/v1/search' --data-binary '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

You can see the result. The result of the above command is:

{
  "search_result": {
    "facets": null,
    "hits": [
      {
        "fields": {
          "_type": "example",
          "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
          "timestamp": "2018-07-04T05:41:00Z",
          "title": "Search engine (computing)"
        },
        "id": "1",
        "index": "/tmp/blast/node1/index",
        "score": 0.09703538256409851,
        "sort": [
          "_score"
        ]
      }
    ],
    "max_score": 0.09703538256409851,
    "request": {
      "explain": false,
      "facets": null,
      "fields": [
        "*"
      ],
      "from": 0,
      "highlight": null,
      "includeLocations": false,
      "query": {
        "query": "+_all:search"
      },
      "search_after": null,
      "search_before": null,
      "size": 10,
      "sort": [
        "-_score"
      ]
    },
    "status": {
      "failed": 0,
      "successful": 1,
      "total": 1
    },
    "took": 171880,
    "total_hits": 1
  }
}

Delete a document

Deleting a document, execute the following command:

$ ./bin/blast delete 1

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/documents/1'

Index documents in bulk

To index documents in bulk, execute the following command:

$ ./bin/blast bulk-index --file ./examples/example_bulk_index.json

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/documents' -H "Content-Type: application/x-ndjson" --data-binary @./examples/example_bulk_index.json

Delete documents in bulk

To delete documents in bulk, execute the following command:

$ ./bin/blast bulk-delete --file ./examples/example_bulk_delete.txt

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/documents' -H "Content-Type: text/plain" --data-binary @./examples/example_bulk_delete.txt

Bringing up a cluster

Blast is easy to bring up the cluster. the node is already running, but that is not fault tolerant. If you need to increase the fault tolerance, bring up 2 more data nodes like so:

$ ./bin/blast start \
              --id=node2 \
              --raft-address=:7001 \
              --http-address=:8001 \
              --grpc-address=:9001 \
              --peer-grpc-address=:9000 \
              --data-directory=/tmp/blast/node2 \
              --mapping-file=./examples/example_mapping.json

$ ./bin/blast start \
              --id=node3 \
              --raft-address=:7002 \
              --http-address=:8002 \
              --grpc-address=:9002 \
              --peer-grpc-address=:9000 \
              --data-directory=/tmp/blast/node3 \
              --mapping-file=./examples/example_mapping.json

Above example shows each Blast node running on the same host, so each node must listen on different ports. This would not be necessary if each node ran on a different host.

This instructs each new node to join an existing node, each node recognizes the joining clusters when started. So you have a 3-node cluster. That way you can tolerate the failure of 1 node. You can check the cluster with the following command:

$ ./bin/blast cluster | jq .

or, you can use the RESTful API as follows:

$ curl -X GET 'http://127.0.0.1:8000/v1/cluster' | jq .

You can see the result in JSON format. The result of the above command is:

{
  "cluster": {
    "nodes": {
      "node1": {
        "raft_address": ":7000",
        "metadata": {
          "grpc_address": ":9000",
          "http_address": ":8000"
        },
        "state": "Leader"
      },
      "node2": {
        "raft_address": ":7001",
        "metadata": {
          "grpc_address": ":9001",
          "http_address": ":8001"
        },
        "state": "Follower"
      },
      "node3": {
        "raft_address": ":7002",
        "metadata": {
          "grpc_address": ":9002",
          "http_address": ":8002"
        },
        "state": "Follower"
      }
    },
    "leader": "node1"
  }
}

Recommend 3 or more odd number of nodes in the cluster. In failure scenarios, data loss is inevitable, so avoid deploying single nodes.

The above example, the node joins to the cluster at startup, but you can also join the node that already started on standalone mode to the cluster later, as follows:

$ ./bin/blast join --grpc-address=:9000 node2 127.0.0.1:9001

or, you can use the RESTful API as follows:

$ curl -X PUT 'http://127.0.0.1:8000/v1/cluster/node2' --data-binary '
{
  "raft_address": ":7001",
  "metadata": {
    "grpc_address": ":9001",
    "http_address": ":8001"
  }
}
'

To remove a node from the cluster, execute the following command:

$ ./bin/blast leave --grpc-address=:9000 node2

or, you can use the RESTful API as follows:

$ curl -X DELETE 'http://127.0.0.1:8000/v1/cluster/node2'

The following command indexes documents to any node in the cluster:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' --grpc-address=:9000 | jq .

So, you can get the document from the node specified by the above command as follows:

$ ./bin/blast get 1 --grpc-address=:9000 | jq .

You can see the result. The result of the above command is:

value1

You can also get the same document from other nodes in the cluster as follows:

$ ./bin/blast get 1 --grpc-address=:9001 | jq .
$ ./bin/blast get 1 --grpc-address=:9002 | jq .

You can see the result. The result of the above command is:

{
  "fields": {
    "_type": "example",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "title": "Search engine (computing)"
  }
}

Docker

Build Docker container image

You can build the Docker container image like so:

$ make docker-build

Pull Docker container image from docker.io

You can also use the Docker container image already registered in docker.io like so:

$ docker pull mosuka/blast:latest

See https://hub.docker.com/r/mosuka/blast/tags/

Start on Docker

Running a Blast data node on Docker. Start Blast node like so:

$ docker run --rm --name blast-node1 \
    -p 7000:7000 \
    -p 8000:8000 \
    -p 9000:9000 \
    -v $(pwd)/etc/blast_mapping.json:/etc/blast_mapping.json \
    mosuka/blast:latest start \
      --id=node1 \
      --raft-address=:7000 \
      --http-address=:8000 \
      --grpc-address=:9000 \
      --data-directory=/tmp/blast/node1 \
      --mapping-file=/etc/blast_mapping.json

You can execute the command in docker container as follows:

$ docker exec -it blast-node1 blast node --grpc-address=:9000

Securing Blast

Blast supports HTTPS access, ensuring that all communication between clients and a cluster is encrypted.

Generating a certificate and private key

One way to generate the necessary resources is via openssl. For example:

$ openssl req -x509 -nodes -newkey rsa:4096 -keyout ./etc/blast_key.pem -out ./etc/blast_cert.pem -days 365 -subj '/CN=localhost'
Generating a 4096 bit RSA private key
............................++
........++
writing new private key to 'key.pem'

Secure cluster example

Starting a node with HTTPS enabled, node-to-node encryption, and with the above configuration file. It is assumed the HTTPS X.509 certificate and key are at the paths server.crt and key.pem respectively.

$ ./bin/blast start \
             --id=node1 \
             --raft-address=:7000 \
             --http-address=:8000 \
             --grpc-address=:9000 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node1 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost

$ ./bin/blast start \
             --id=node2 \
             --raft-address=:7001 \
             --http-address=:8001 \
             --grpc-address=:9001 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node2 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost

$ ./bin/blast start \
             --id=node3 \
             --raft-address=:7002 \
             --http-address=:8002 \
             --grpc-address=:9002 \
             --peer-grpc-address=:9000 \
             --data-directory=/tmp/blast/node3 \
             --mapping-file=./etc/blast_mapping.json \
             --certificate-file=./etc/blast_cert.pem \
             --key-file=./etc/blast_key.pem \
             --common-name=localhost

You can access the cluster by adding a flag, such as the following command:

$ ./bin/blast cluster --grpc-address=:9000 --certificate-file=./etc/blast_cert.pem --common-name=localhost | jq .

$ curl -X GET https://localhost:8000/v1/cluster --cacert ./etc/cert.pem | jq .

blast's People

Contributors

Stargazers

Watchers

blast's Issues

Performance test and compare performance with elasticsearch or go-geo/roit

In some cases, performance of blast is worth being noticed.

Giving a benchmark is a good way to make users to choose this project.

Error getting document by ID over rest API

I get an error when issuing a very simple query over rest endpoint:

flaviostutz-Mac:fess flaviostutz$ curl -vv --location --request GET 'localhost:6000/v1/documents/a2b'
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 6000 (#0)
> GET /v1/documents/a2b HTTP/1.1
> Host: localhost:6000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/json
< Grpc-Metadata-Content-Type: application/grpc
< Grpc-Metadata-Content-Type: application/grpc
< Date: Fri, 17 Jan 2020 21:27:59 GMT
< Content-Length: 130
<
* Connection #0 to host localhost left intact
{"error":"unknown message type \"map[string]interface {}\"","code":2,"message":"unknown message type \"map[string]interface {}\""}* Closing connection 0

Seems like a Golang cast issue. I could not find the location this is handled on code. Anyone could help?

Implement CORS

Language contribute

How can we add new language for better indexing?

Improve indexing performance

Indexing content / faceted search

Hey,

Hope you are all well !

I wanted to extend the following golang project, https://github.com/hoop33/limo, using also bleve, with blast. Actually, I forked and updated limo in order to automatically fetch and add the repo topics in order to pre-fill limo's tags index.

But I would like to create a web-ui and create some facets to explore my starred repository. So after a couple of searches, I found your project wrapping the bleve package.

Can you provide an example or more explanation about how to index content ?

Content to index: https://github.com/hoop33/limo/blob/master/model/star.go
Current indexer: https://github.com/hoop33/limo/blob/master/model/index.go

Questions:

How to integrate blast into another go app ?
How to configure blast for managing those facets ?
How to auto-increment the next doc id to index automatically ?
Is it possible to load some white/stop/watch words to match (eg. https://github.com/porter-io/tagg-python/blob/master/tagg/default_defs.json) while indexing a new content ?

Thanks in advance.

Cheers,
Richard

Badger instead of RocksDB

Have you considered using Badger K/V store instead of RocksDB in order to keep CGO dependencies minimal ?

https://github.com/dgraph-io/badger

Implement TLS Encryption

Ukrainian analyzer

Hello.

Is there an analyzer for the Ukrainian language?

Possible to build a Blind index using blast?

We have security requirements that require the use of encryption at rest. We would like to be able to build an index of the unencrypted plain text, and then ship the index and the encrypted data files, without leaking the plain text. Is this something that is possible with blast/bleve?

The storage backend only supports scorch

When KVS such as BoltDB is used as a storage backend, the index size is too large, so only scorch is supported.

Multiple index

Can I run 1 indexer for multiple indexes?

Search with prefix

Hi,

I was wondering if I’m doing anything wrong, but other than doing a match query I can’t seem to perform a prefix query at all. According to the bleve documentation I should be able to do that by using “prefix”: “searchterm”. However it doesn’t yield any results. Do you have an example search query for prefix?

Thanks!

More Query String Query Examples

Now, some query examples(simple and prefix) are given, but it seems to be not enough to get quick start for more complicated query logic.

Query String Query supports all kinds of query such as Phrases, Field Scoping, Boolean Query, Numeric Ranges, fuzzy search and so on.

If more examples are given, it seems to be more friendly to tiro.

How to search documents with id prefix?

I have two kind of documents, A and B, I index them in bulk with different document id prefix. basically like:

{"fields":{"code":"xxx","name":"xxx"},"id":"path/A/10"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/A/11"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/B/20"}
{"fields":{"code":"xxx","name":"xxx"},"id":"path/B/21"}

Now I only want to search documents with id prefix "path/A/". How to do this?

An usable web client for blast just like kibana for elasticsearch?

For using blast easily, a web client for accessing blast seems to be a good way.

Is there an an usable web client for blast just like kibana for elasticsearch?

Or, is there an open source project to build a web client quickly for querying?

And I have created a simple go-web for querying term and want it to support more query method like numeric range, prefix, date and so on.

May you have some suggestion? thanks.

REST structure

So I want to load into Blast using my own go RESTful client code.

In your example

$ cat ./example/document_1.json | xargs -0 ./bin/blast put document --request

I see we are doing a PUT to 0.0.0.0:8000/rest/ID where ID is the ID of the document.

However, how is the JSON document sent in that PUT command? Is this a BODY only or is there an association that needs to take place. I have used the HTTPie package with

http -v -j PUT 0.0.0.0:8000/rest/1 @sodataset.json

but it comes back 400 bad request. If we are using HTTPie or curl what would a PUT command look like?

I'm trying to load 50K JSON-LD documents (schema.org) from Minio into Blast.

Thanks!
Doug

Feature idea: KVS pub sub

The KVS is a generic bucket store.
I was thinking of extending it to publish changes over GRPC.

Use Case ?
When you build Clients using this system, its very useful to know when anything has change and he nature of the change. Then as a subscriber i can update my many microservices or even gui client using this. It keeps them all in sync basically.

The event would be like:

namespace: the bucket namespace
eventtype: Create/Read/Update/Delete
data: protobuf or json.

Also should have the ability to turn off eventtype Read because no one normally needs that, but it can be useful for dynamically knowing who is reading where.

Implementation:
GRPC Middleware might be the perfect fit !
https://github.com/grpc-ecosystem/go-grpc-middleware
Also great for adding other things like security etc.

Because its GRPC, it will be easy to then recieve it and put it onto NATS message broker later also as a 2nd bit of work.

Index.
I thought also about the index but thinks its not worth the effort. What you can do is make each index query output to a cache into the KVS store. then you can use PUB SUB from that. Almost like a Materialized View with pub sub on top of it. And also gives you caching for free to a degree.

Are there any GUIs available?

Hi, just a small question for anyone who might know: Are there any GUIs available for interacting with Blast?

Something like https://github.com/appbaseio/dejavu, but for Blast instead of ES?

Add federated search

GEO searches

Is it possible to use blast for Geo localized searches ? It is nowadays an important feature in many projects.
If yes would it be possible to include an example at one point ? ( mapping, and search request)

I have put a very basic example of Geo search with bleve there : https://github.com/hubyhuby/bleve-search-example/blob/444d18d810064302fef76693f86f07c533552897/main.go#L138

There are two main search I see :
*Geo box search around a point
*and Geo Sort search by distance to a point.

I understand it is a WIP, but I am really looking forward to this project.
Thanks for this nice project !

Which index type does blast used? upside_down or scroch??

It seems that scroch index type can be lighter(refer to bleve/issues/1186), which costs smaller index size.

If blast uses upside_down index type by default, how can I change to use scroch index type?

Thanks a lot.

Possible to run as embedded service?

Is it possible to run blast as an embedded service into an existing application cluster? That is, if there is already a service running 2+ instances and it wants to add indexing, could it run a blast cluster using the existing service nodes? I'm thinking of this like the way Nats.io server can be embedded instead of having to run standalone gnatsd processes.

Ops

Really useful logic here for Ops and Federation ( see branch )

https://github.blog/2019-03-05-vulcanizer-a-library-for-operating-elasticsearch/

We can use this to help with doing the right things in ops.
For example sharing and catch-up.

segfault when port 8080 is already bound

Hello,

Thanks again for your last bugfix. Here is another issue:

./bin/blast-index start --node-id=index1 --data-dir=/tmp/blast/index1 --bind-addr=:6060 --grpc-addr=:5050 --http-addr=:8080 --index-mapping-file ./example/index_mapping.json
2019/03/14 16:26:27.068886 github.com/mosuka/blast/index/server.go:131 [ERR] listen tcp :8080: bind: address already in use
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0xb51108]

goroutine 39 [running]:
github.com/mosuka/blast/index.(*Server).Start(0x0)
/home/rbastic/go-src/src/github.com/mosuka/blast/index/server.go:147 +0x68
created by main.execStart
/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blast-index/start.go:78 +0x802

Is there a way to create new mapping definitions using the REST API?

I need to dynamically create new document types (new mappings) after running an index in Blast. Is there a way to do it?

Reusability - grpc

We are considering to use Blast for a Project to replace Elastic Search.

But we need to support many languages and GRPC makes it much easier.
Also for the HTTP is gives the devs a swagger REST API.

Blast would be easier to use from different languages and the web if the API used GRPC and GRPC Gateway.

https://github.com/grpc-ecosystem/grpc-gateway

Easy example
https://github.com/johanbrandhorst/grpcweb-boilerplate

docker no http beyond root

I get 404 for any query except /

curl -X GET http://localhost:10002/
{
  "status": 200,
  "version": "v0.8.1"
}

failure

$ curl -X GET http://localhost:10002/v1/liveness_check 
404 page not found

  blast:
     image: mosuka/blast:v0.8.1
     ports:
       - 10000:10000
       - 10001:10001
       - 10002:10002
     ulimits:
       nofile:
         soft: "65536"
         hard: "65536"
#     env_file:
#       - ./example.env
     environment:
       - SERVICE_PORTS=10000, 10001, 10002
     volumes:
       - "/data/volumes/blast:/data"
#     networks:
#       - web
# 0.3.0
#     command: ["start", "--bind-addr=:10000", "--grpc-addr=:10001", "--http-addr=:10002", "--node-id=node1"]
# v0.8.1
     command: ["blast", "indexer","start","--data-dir=/data", "--node-address=:10000", "--grpc-address=:10001", "--http-address=:10002", "--node-id=node1"]

mac build not up to date

Doing the full mac build, i think it yells because the framework files were updated on the latest MacOS.
I had exactly the same problem using opengl from golang also.

It does build and does run.

The source of the bug is here: golang/go#26073

x-MacBook-Pro:blast apple$ make build
## mac with all ext
cd /Users/apple/workspace/go/src/github.com/mosuka/blast &&  GOOS=darwin \
    CGO_LDFLAGS="-L/usr/local/opt/icu4c/lib -L/usr/local/opt/rocksdb/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd" \
    CGO_CFLAGS="-I/usr/local/opt/icu4c/include -I/usr/local/opt/rocksdb/include" \
    CGO_ENABLED=1 \
    BUILD_TAGS="full" \
    make build
>> building binaries
   VERSION     = 0.4.0
   GOOS        = darwin
   GOARCH      = amd64
   CGO_ENABLED = 1
   CGO_CFLAGS  = -I/usr/local/opt/icu4c/include -I/usr/local/opt/rocksdb/include
   CGO_LDFLAGS = -L/usr/local/opt/icu4c/lib -L/usr/local/opt/rocksdb/lib -lrocksdb -lstdc++ -lm -lz -lbz2 -lsnappy -llz4 -lzstd
   BUILD_TAGS  = full
./cmd/blast
# crypto/x509
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Security.framework/Security.tbd and library file /System/Library/Frameworks//Security.framework/Security are out of sync. Falling back to library file for linking.
./cmd/blastd
# github.com/mosuka/blast/cmd/blastd
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Security.framework/Security.tbd and library file /System/Library/Frameworks//Security.framework/Security are out of sync. Falling back to library file for linking.

Management of federated cluster

What's the suggested plan for management of the federated cluster ?

Try to be DRY and use the standard tracing, logging, metrics solutions out there in golang land.
This enables a dev to bring blast up and have the insight without learning new tools.

Am wondering if we should build a web based dashboard to provide the ability for a developer ( rather than a SRE ) to play with blast but also to have a single management pane seperate from the normal SRE stuff

basic example

Any chance you can make a golang example that uses the GRPC API and the test data in the "examples" directory ?

It would make it much easier to get going and helping to fix things too.

Running bleve extensions outside docker on Mac and windows

Is anyone working on this aspect ?

I want to run blast as a standard system service rather than inside docker.
And I need to run on Mac and Windows.
So that brings up the bleve cgo based extensions.

re-index / update an individual document

Sorry if this had been addressed but I could not find anywhere. It is not an issue but rather a feature. Saying I have a collection of documents in MongoDB and indexed using blast. Now I update an individual document in MongoDB database, how I can reindex this document with blast?

Is there anyway that I can retrieve indexed Id of this document so that I can delete and recreate new index? Or Are there any better solution?

Comparison with other systems like Toshi/Tantivy

Are there any marketing materials that compare with other systems like Toshi/Tantivy in terms of features and performance?

replace cgo's idea

i would really love to replace the CGO, so its more portable.

Language detector:
https://github.com/abadojack/whatlanggo

replaces https://github.com/blevesearch/cld2
it looks pretty well tested

Word Stemming:
There are a few around in golang. Still researching.

Let me know your thoughts...

segfault on ubuntu

Hello,

I ran a fresh build off the latest master and then a segfault happened when I tried to startup blast.

rbastic@asgard:~/go-src/src/github.com/mosuka/blast$ ./bin/blastd data \
>     --raft-addr=127.0.0.1:10000 \
>     --grpc-addr=127.0.0.1:10001 \
>     --http-addr=127.0.0.1:10002 \
>     --raft-node-id=node1 \
>     --raft-dir=/tmp/blast/node1/raft \
>     --store-dir=/tmp/blast/node1/store \
>     --index-dir=/tmp/blast/node1/index \
>     --index-mapping-file=./etc/index_mapping.json

    ____   __              __ 
   / __ ) / /____ _ _____ / /_
  / __ \ / // __ '// ___// __/  The lightweight distributed
 / /_/ // // /_/ /(__  )/ /_    indexing and search server.
/_.___//_/ \__,_//____/ \__/    version 0.4.0

2019/03/14 02:04:44.592113 github.com/mosuka/blast/node/data/service/service.go:104 [ERR] no analyzer with name or type 'ja' registered
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x9b2706]

goroutine 1 [running]:
github.com/mosuka/blast/index.(*Index).Close(0x0, 0xd71c18, 0xc0001f96c8)
	/home/rbastic/go-src/src/github.com/mosuka/blast/index/index.go:175 +0x26
github.com/mosuka/blast/node/data/service.(*Service).Stop(0xc0000a70e0, 0x1, 0xc0001c1640)
	/home/rbastic/go-src/src/github.com/mosuka/blast/node/data/service/service.go:166 +0x33
main.data(0xc0000b14a0, 0xe56c20, 0xc0003a01c0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blastd/data.go:135 +0x1559
github.com/mosuka/blast/vendor/github.com/urfave/cli.HandleAction(0xc011c0, 0xd73868, 0xc0000b14a0, 0xc0000a6f00, 0x0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/app.go:490 +0xc8
github.com/mosuka/blast/vendor/github.com/urfave/cli.Command.Run(0xd3d58b, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0xd47611, 0x11, 0x0, ...)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/command.go:210 +0x92f
github.com/mosuka/blast/vendor/github.com/urfave/cli.(*App).Run(0xc0000adba0, 0xc0000aa000, 0xa, 0xa, 0x0, 0x0)
	/home/rbastic/go-src/src/github.com/mosuka/blast/vendor/github.com/urfave/cli/app.go:255 +0x69b
main.main()
	/home/rbastic/go-src/src/github.com/mosuka/blast/cmd/blastd/main.go:51 +0x2b2

Looks like a very exciting project, curious to know if I might be building it wrong somehow (Ubuntu 18.10)

Failing docker containers with "No help topic for 'blastd'"

summary

The docker container seems to be broken. Instead of starting up properly they fail and exit.

steps to reproduce

docker pull mosuka/blast:latest
docker-compose up

expected result

A cluster of blast should be running and available over the network.

actual result

The terminal output is No help topic for 'blastd'for every node and the nodes are restarted. They exit with blast_blast1_1 exited with code 3.

Make the API easier to work with

Have you used https://github.com/googleapis/gnostic ?

It would make lots of the code you wrote redundant and make it much easier to extend.
Also much less bugs.

To explain.
From GPRC it can gen all of your openAPI based REST !
OpenAPI is the main way to do REST. Swagger is dead.

It can also do the opposite amazingly. OPENAPI to GRPC.
But i think using GRPC as the source of truth is best for Blast.

Have a think about it.
I can help if your interested...

guidance on generating index mapping?

Thanks for the previous help... I've come back to playing with this and have a question or two if you have time.

I've generated a simple Dockerfile and docker-ized blast https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast
I've not included any of the config files in this container.. though I don't know yet what is default in the Go code... though I did go through it. I've using Bleve as well in my code at https://github.com/earthcubearchitecture-project418/gleaner but not with the sophistication you are. :)
I've been playing with loading schema.org JSON-LD (type Dataset) into Blast and trying to search (docs at https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast/examples ) where sodataset.json is a JSON-LD doc wrapper with the

{
    "id": "1",
    "fields": {

I think they need ??

My question is this:

If one wanted to leverage Blast for other JSON documents, what are the basic steps needed?
I was curious why my test failed since I thought that Bleve instance in Blast would simply use the

indexMapping := bleve.NewIndexMapping()

as default an give me simple default index of the JSON structure. My plan was to build out a more focused mapping from there. However, that doesn't even seem to work since when I load the document and search for exact matches of known words in the document I get nothing. I am wondering if it is trying to force my document into a mapping that it does not fit, resulting in no search results.

In the process

cat example/sodataset.json | xargs -0 ./bin/blast put document --request

./bin/blast get document --id 1

cat search_requestv2.json | xargs -0 ~/src/git/blast/bin/blast search --request > ../searchoutput.json

The first two work fine, I can load and retrieve the document. I am not able to structure a valid search with either of the search request documents inside https://github.com/earthcubearchitecture-project418/garden/tree/master/newindex/Blast/examples

Any guidance appreciated!

Thanks
Doug

key issue

trying to get started with blast...

tried the command and got
"key path not found"

fils@xps:~/src/git/blast/bin$ cat ../doc1.json  | xargs -0 ./blast put document --request
Error: Key path not found
Usage:
  blast put document [flags]

Flags:
      --grpc-server-address string   Blast server to connect to using gRPC (default "0.0.0.0:5000")
      --dial-timeout int             dial timeout (default 5000)
      --request-timeout int          request timeout (default 5000)
      --id string                    document id
      --fields string                document fields
      --request string               request file
  -h, --help                         help for document

Global Flags:
      --output-format string   output format (default "json")
  -v, --version                show version number

with document

{
	"document": {
		"id": "1",
		"fields": {
			"name": "Bleve",
			"description": "Bleve is a full-text search and indexing library for Go.",
			"category": "Library",
			"popularity": 3.0,
			"release": "2014-04-18T00:00:00Z",
			"type": "document"
		}
	}
}

any ideas what I am doing wrong?

Syncing or integrating with existing PostgreSQL database

Are there any recommended approach to sync with PostgreSQL? IOW, how do people using Blast integrate it with their existing PostgreSQL systems?

Delete the experimentally implemented feature for distributed search

Rethink the possibility of implementing distributed search using etcd or other software.
This feature is experimental and will be removed.

Custom header?

Hi,

I need to add a custom header for environment selection behind a load balancer. I noticed that I can’t append to the outgoing context of the grpc client, since it’s private.

I am using tag 0.7.1. Is adding custom headers implemented in higher versions?

Thanks

ignore fields for index

Hi,

in an attempt to reduce the index size, I preprocessed the data. However, when I will use the API, I want to get the human readable data back so I can show it to the user. Is there a way to exclude fields when building an index? I tried to use "x" for the preprocessed field and "_x" for the original text. Unfortunately, this increased the index by a lot so I believe the field starting with "_" was not excluded. Is there a way to do this? My only other idea is to build wrapper API which stores the text to an ID in a dictionary and then returns that. But that seems like it should already be supported.

Mongodb example

Looks like this project could be a pretty good alternative of Elastic.
I am considering of this project for an alternative of Elasticsearch which is super heavy memory consuming index search engine. This can be an alternative for a medium-scale or large scale project?
The problem is my main db is Mongodb. Should I extract json from Mongodb periodically and send back to Blast to build indexes?
What is the best option for my situation??
I need an example mongodb connector to communicate with Blast via GRPC to realtime build index like the elastic search doing

One more question, Is it a good idea to interact with Blast server from end-user clients.
My situation is I want to let users do search/filter items in the browser directly. how about grpc-web?(I know grpc-web project is immature) What about HTTP2 + json(Rest) ?

great work! can you rewrite boltdb backend to use redis?

Instead of keeping two databases of bolt and redis, possible to rewrite backend as redis? there's interface store. hope you can help

Support bulk update

grpc gen ( web and non web)

Just adding this for generating different languages.
https://github.com/johanbrandhorst/grpc-web-generators

Consensus Protocol Implement method?

I know from Readme, the cluster is built based on Raft consensus algorithm.

But when I try to use the cluster mode, when I kill the leader node, re-election didn't happened.

Does Blast support leader re-election?

And for consensus, The write operation(indexing, PUT) should only happen on Leader node. I use http indexing request to follower node when leader has been killed, it still works well, so I am a little confused. Can write operation work on followers?

If write operation can work on followers, when different write operations happen at the same time to different nodes, consensus and sequence may not be guaranteed

Elasticsearch backward

Is there any chance to have elasticsearch backward.
Major probem with elastic search is system requirement that if blast has same api via elasticsearch many developers will be replace it.
32gb for one node 😢

mosuka / blast Goto Github PK

blast's Introduction

This project has been taken over by Phalanx.

Blast

Features

Install build dependencies

Ubuntu 18.10

macOS High Sierra Version 10.13.6

Build

Linux

macOS

Windows

Build with extensions

Linux

macOS

Build flags

Binary

Test

Package

Configure

Start

Health check

Liveness prove

Readiness probe

Put a document

Get a document

Search documents

Delete a document

Index documents in bulk

Delete documents in bulk

Bringing up a cluster

Docker

Build Docker container image

Pull Docker container image from docker.io

Start on Docker

Securing Blast

Generating a certificate and private key

Secure cluster example

blast's People

Contributors

Stargazers

Watchers

Forkers

blast's Issues

summary

steps to reproduce

expected result

actual result

Recommend Projects

Recommend Topics

Recommend Org