danielberkompas / elasticsearch-elixir Goto Github PK

No-nonsense Elasticsearch library for Elixir

License: MIT License

Elixir 96.17% Shell 3.03% Ruby 0.51% Dockerfile 0.29%

elasticsearch-elixir's Issues

AWS Elasticsearch Service signed requests

I’m currently using elasticsearch-elixir and I extended Elasticsearch.API to use AWS Elasticsearch Service with signed requests. I wonder if this is a feature you’d like to include in this project, I could make a PR if interested. What do you think?

Reloading Indexes on New Record Insertion in DB

Here, I'm building the indexes when the application starts using mix task

How can we reload the index as soon as a new record inserted into DB?
Do we have to stop application and rebuild the indexes here?

Glad if we have any automation here, unfortunately I did not find any other than tasks.

Thanks :)

Is HTTPoison configuration setup required for this app?

In your readme, the configuration section specified a HTTPoison configuration. Is this required for the app to run?

Handle Get API document not found error

When a document is not found via the Get API the following exception is raised -

iex(1)> Elasticsearch.get(MyApp.ElasticsearchCluster, "/index-name/_doc/123")

** (FunctionClauseError) no function clause matching in Elasticsearch.Exception.build/2

    The following arguments were given to Elasticsearch.Exception.build/2:

        # 1
        %{
          "_id" => "123",
          "_index" => "index-name",
          "_type" => "_doc",
          "found" => false
        }

        # 2
        nil

    Attempted function clauses (showing 4 out of 4):

        defp build(%{"error" => error} = response, query) when is_map(error)
        defp build(%{"error" => error}, query) when is_binary(error)
        defp build(%{"result" => type}, query)
        defp build(error, query) when is_binary(error)

    (elasticsearch) lib/elasticsearch/exception.ex:35: Elasticsearch.Exception.build/2
    (elasticsearch) lib/elasticsearch/exception.ex:22: Elasticsearch.Exception.exception/1
    (elasticsearch) lib/elasticsearch.ex:389: Elasticsearch.format/1

Using elasticsearch as an executable gives error

I'm setting the executable in the application supervision tree:

worker(Elasticsearch.Executable, [
        "Elasticsearch",
        "./vendor/elasticsearch/bin/elasticsearch",
        9200
      ], id: :elasticsearch),

and it's giving the following error:

sh: ~/myapp/_build/dev/lib/elasticsearch/priv/bin/wrap: No such file or directory
sh: line 0: exec: ~/myapp/_build/dev/lib/elasticsearch/priv/bin/wrap: cannot execute: No such file or directory
2018-04-24T18:01:18.863760Z [info] Application re exited: Re.Application.start(:normal, []) returned an error: shutdown: failed to start child: :elasticsearch
    ** (EXIT) an exception was raised:
        ** (MatchError) no match of right hand side value: nil
            lib/elasticsearch/executable.ex:33: Elasticsearch.Executable.init/1
            (stdlib) gen_server.erl:365: :gen_server.init_it/2
            (stdlib) gen_server.erl:333: :gen_server.init_it/6
            (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

Inspecting the deps folder, seems like priv/bin/wrap is not "shipped".
Manually adding it copying from github seems to solve the problem.

Maximum amount of index saved

Hello people.

Is it possible for me to set the number of indexes that will be saved when I build? I have several versions and would like to exclude them automatically when I run the build.

Missing Elasticsearch.Exception.t causes dialyzer error

Elasticsearch.ex specifies a response type:
@type response :: {:ok, map} | {:error, Elasticsearch.Exception.t()}
but there is no type attribute in Elasticsearch.Exception.

Deleting with a body

It is unclear to me if it's possible to delete with a body with the current API. Elasticsearch.post//4 allows setting a body, but not Elasticsearch.delete/3.

My use-case is deleting a search context (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html#_clear_scroll_api).

Failed to run cli app

** (ArgumentError) argument error
(stdlib 3.8) :ets.lookup(Util.ElasticsearchCluster.Config, :config)
(elasticsearch 1.0.1) lib/elasticsearch/cluster/cluster.ex:218: Elasticsearch.Cluster.read_config/1
(elasticsearch 1.0.1) lib/elasticsearch.ex:286: Elasticsearch.put/4
(util 0.1.0) lib/cli.ex:6: Util.CLI.main/1
(elixir 1.11.3) lib/kernel/cli.ex:124: anonymous fn/3 in Kernel.CLI.exec_fun/2

---config.exs

use Mix.Config

config :util, Util.ElasticsearchCluster,
url: "http://localhost:9200"

Make the mix task runnable through iex

Currently, elasticsearech.build is only a mix task and is not being made available through iex (mix dir is not in elixirc_paths/1 on mix.exs).

Would be interesting to extract to a module and be able to call through iex.

Remove dependency on maybe package

I'm using towel package which has Maybe module - just like maybe package.
As a result generating release fails:

     cmd: MIX_ENV=prod mix release --quiet
  stdout: ==> Release failed, during .boot generation:
                  Duplicated modules:
              	'Elixir.Maybe' specified in towel and maybe

It looks like maybe is used in elasticsearch-elixir only once:

https://github.com/infinitered/elasticsearch-elixir/blob/master/lib/mix/elasticsearch.build.ex#L101

So it can be safely removed and replaced, say, with Kernel.get_in.

I understand that it's a rare case and it's unlikely that anyone else will run into this issue but still IMO it's always a good idea to remove dependency - especially if it can be easily replaced with functions from standard library.

Sample "/priv/elasticsearch/posts.json" file

As I am trying to transition to Elasticsearch versions 7.x from 6.3.2, I am having trouble building an index with "mix elasticsearch.build posts --cluster MyApp.ElasticsearchCluster". Could you post a sample file or direct me to one ? Thanks.

Add --append option to `mix elasticsearch.build`

This option will append data to an existing index rather than rebuilding it from scratch. Needs a spec.

Repo.stream no longer supports preload

In version 0.4.0 it was possible to use a load function as can be seen in the link below. But this option has been removed. I believe we will need to have the load function again. I'll explain below.
https://github.com/infinitered/elasticsearch-elixir/blob/58948d14a5806d76aa469702c39eef8643738ac4/guides/upgrading/0.4.x_to_0.5.x.md

In Ecto 3.0, support for preload along with Repo.stream has been removed as can be seen on these links:

https://elixirforum.com/t/repo-stream-with-preload-new-warning/17043

elixir-ecto/ecto@6655a9a#diff-122d0a4bbce6a65cc1523584a00193aaR138)

Without the option to preload next to the stream, there is only the option of preloading within Elasticsearch.Document, which is bad because it would have to preload every record, that is, we would lose the preload in batch.

I think that including again the load option, we could go back to doing things like:

  def load (schema, offset, limit)
    schema
    |> offset (^ offset)
    |> limit (^ limit)
    |> Repo.all ()
    |> Repo.preload ([: ad,: address,: medias])
  end

I do not know if I could explain. If you have any doubt, I can try to give you more examples.

Consistent memory usage when bulk indexing?

Hello all,

I am using elasticsearch-elixir to bulk insert a database from a CSV file. Everything works except the memory keeps growing until Beam crashes due to insufficient free memory. If I use a similar stream in Elixir without using elasticsearch-elixir I am able to go though the collection (and count something for example). I would think that that the old CSV data (already indexed) would not have to be kept in memory, but for some reason this is what is happening.

Is there a way to bulk insert half of the data and then continue (e.g. without creating a new version of the index)? Or perhaps this is bug in elasticsearch-elixir?

Document mapping type name can't start with '_'

When using put_document I received the following error:

** (MatchError) no match of right hand side value: {:error, %Elasticsearch.Exception{col: nil, line: nil, message: "Document mapping type name can't start with '_', found: [_doc]", query: nil, raw: %{"error" => %{"reason" => "Document mapping type name can't start with '_', found: [_doc]", "root_cause" => [%{"reason" => "Document mapping type name can't start with '_', found: [_doc]", "type" => "invalid_type_name_exception"}], "type" => "invalid_type_name_exception"}, "status" => 400}, status: 400, type: "invalid_type_name_exception"}}

My Elasticsearch version is 6.5.2.

I checked that by default document_url is using "_doc" and it seems that it will no longer be possible to start with "_" the type. Is there a way to solve without changing the code?

https://github.com/infinitered/elasticsearch-elixir/blob/5c7c2dd186217cb7d3c6252275e4ae47b193fe4b/lib/elasticsearch.ex#L127

Let hot_swap work also when there are documents with errors

There may be documents that do not pass ES validation. For example self intersecting polygons.
This leads hot_swap failying since every step needs to pass successfully:

with :ok <- create_from_file(config, name, settings_file),
         :ok <- Bulk.upload(config, name, index_config),
         :ok <- __MODULE__.alias(config, name, alias),
         :ok <- clean_starting_with(config, alias, 2),
         :ok <- refresh(config, name) do
      :ok

I can not guarantee that all the documents will always be valid but I would want to reindex the others and inform at the end that some documents could not be inserted.
I had to rewrite bulk upload in my code to pass this limitation:

with :ok <- Index.create_from_file(config, name, settings_file),
         bulk_upload(config, name, index_config),
         :ok <- Index.alias(config, name, alias),
         :ok <- Index.clean_starting_with(config, alias, 2),
         :ok <- Index.refresh(config, name) do
      :ok
.....
defp bulk_upload(config, name, index_config) do
    case Bulk.upload(config, name, index_config) do
      :ok ->
        :ok

      {:error, errors} = err ->
        Bugsnag.report(
          ElasticsearchError.exception("Errors encountered indexing restaurants"),
          severity: "warn",
          metadata: %{errors: errors}
        )

        err
    end
  end

My question is can we make this bulk upload step not failing in case there are some errors?

SSL Errors with HTTPoison

I'm getting the follwing error:

$ Elasticsearch.get("/_cat/health")

13:51:37.261 [info]  ['TLS', 32, 'client', 58, 32, 73, 110, 32, 115, 116, 97, 116, 101, 32, 'certify', 32, 'at ssl_handshake.erl:1624 generated CLIENT ALERT: Fatal - Unknown CA', 10]
{:error, %HTTPoison.Error{id: nil, reason: {:tls_alert, 'unknown ca'}}}

When I run the following query, it runs successfully. Ref

$ HTTPoison.get("https://servername:9200/_cat/health", [], [ ssl: [{:versions, [:'tlsv1.2']}] ])

Can we set some Env var or set some config value to solve this issue ?
I tried the following in config.exs file with no good result.

config :ssl, protocol_version: :"tlsv1.2"

Increase test coverage to >90%

Error when call build task with distillery

When I try to run with distillery command:

Building elasticsearch indexes for app
** (exit) exited in: GenServer.call(Mix.ProjectStack, {:get, #Function<11.107724793/1 in Mix.ProjectStack.peek/0>}, 30000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir) lib/gen_server.ex:914: GenServer.call/3
    lib/mix/project.ex:155: Mix.Project.get/0
    lib/mix/task.ex:274: Mix.Task.run/2
    (elasticsearch) lib/mix/elasticsearch.build.ex:49: Mix.Tasks.Elasticsearch.Build.run/1
    (app) lib/mix/tasks/app.task.ex:8: Mix.Tasks.App.Task.run/1
    lib/mix/lib/releases/runtime/control.ex:717: Mix.Releases.Runtime.Control.eval/2
    lib/entry.ex:44: Mix.Releases.Runtime.Control.main/1
    (stdlib) erl_eval.erl:677: :erl_eval.do_apply/6

I think, the problem is Mix.Task.run("app.start", []). Maybe if the elasticsearch.build.ex contains only the code to build/index. Then, we can create our task(with our start_app) that calls Mix.Tasks.Elasticsearch.Build.run..

Any idea? Thanks!

Compilation error in file lib/mix/elasticsearch.build.ex

When compiling my Elixir (1.6.5) project with elasticsearch-elixir installed, I run into this error:

Compiling 15 files (.ex)

== Compilation error in file lib/mix/elasticsearch.build.ex ==
** (CompileError) lib/mix/elasticsearch.build.ex:92: undefined function maybe/2
    (stdlib) lists.erl:1338: :lists.foreach/2
    (stdlib) erl_eval.erl:670: :erl_eval.do_apply/6
could not compile dependency :elasticsearch, "mix compile" failed. You can recompile this dependency with "mix deps.compile elasticsearch", update it with "mix deps.update elasticsearch" or clean it with "mix deps.clean elasticsearch"

Also I love this project!!!!

elasticsearch deprecated type=string

The doc blocks for Index.create_from_file/2 include a sample mapping for elasticsearch where it uses type=string .. which fails on recent versions, since it got deprecated almost two years ago in 5.x:

With the release of Elasticsearch 5.0 coming closer, it is time to introduce one of the release highlights of this upcoming release: the removal of the string type.

since 5.1.1 is explicitly referenced in bin/setup i'm hesitating to "just change" type=string to type=text in the doc block, which works in 6.2.3 w/o actually upgrading the setup script as well ..

WDYT @danielberkompas ?

Use Repo pattern for configuration

Some applications might want to talk to multiple separate Elasticsearch stores. The current configuration strategy assumes that you only have one Elasticsearch endpoint.

We should take a page from Ecto's book and allow you to configure multiple Elasticsearch "Repos", something like this:

defmodule MyApp.Elasticsearch do
  use Elasticsearch, otp_app: :my_app
end

You'd then configure MyApp.Elasticsearch like we do now. This would allow a single app (or separate apps within Umbrella apps) to talk to multiple endpoints.

Error with the elasticsearch.build task

Library version: 0.5.1

While trying to run the elasticsearch.build task for the first time in the new project against the server from Elastic Cloud.

The error is:

** (Mix) Index resources could not be created. 

%Elasticsearch.Exception{col: nil, line: nil, 
  message: "forcemerge takes arguments in query parameters, not in the request body", 
  query: nil, 
  raw: %{
    "error" => %{
      "reason" => "forcemerge takes arguments in query parameters, not in the request body", 
      "root_cause" => [%{"reason" => "forcemerge takes arguments in query parameters, not in the request body", "type" => "illegal_argument_exception"}], 
      "type" => "illegal_argument_exception"
    }, 
    "status" => 400
  }, 
  status: 400, 
  type: "illegal_argument_exception"
}

I don't have much experience with ElasticSearch, but it seems to me that the culprit is this call:

https://github.com/infinitered/elasticsearch-elixir/blob/4776a30214229ad95570395fcb400bef75860013/lib/elasticsearch/indexing/index.ex#L145-L149

Where it passes an empty JSON document as argument instead of passing no body at all, but I'm not 100% sure.

FunctionClauseError: No function clause matching

In one of our production systems, I am seeing the following in the logs hundreds of times. Should this exception be handled by the Elasticsearch Elixir client? The original function call is Elasticsearch.put_document.

(elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3

 ** (FunctionClauseError) no function clause matching in Elasticsearch.Exception.build/2

 (elasticsearch) lib/elasticsearch/exception.ex:35: Elasticsearch.Exception.build(%{"message" => nil}, nil)

 (elasticsearch) lib/elasticsearch/exception.ex:22: Elasticsearch.Exception.exception/1

 (elasticsearch) lib/elasticsearch.ex:389: Elasticsearch.format/1

mix elasticsearch.install does not work for elasticsearch >= 7

mix elasticsearch.install vendor --version 7.6.0

It tries to curl https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.0.tar.gz

But the file is in:

If the library supports only elasticsearch 6, it should be explicit in the README

Error installing on ubuntu linux

I'm trying to install on Ubuntu 18, but this bug is appearing when installing Kibana. Elasticsearch is installed successfully.

mix elasticsearch.install . --version 6.2.4

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.7M 100 27.7M 0 0 1392k 0 0:00:20 0:00:20 --:--:-- 1976k

** (Mix) Unsupported system for Kibana: {:unix, :linux}

Indexing fails on second Elasticsearch.Store load?

So I have implemented the Store, Document Protocol and a mapping file and I'm nearly getting things working with your great library.

Unfortunately it seems to be calling the Elasticsearch.Store load function twice even though there are fewer than 5000 entries for the first index. I would have thought that it would never try to index again returned values < bulk_insert_size is the number of entries returned?

Returned from Elasticsearch.Store:

[
    %MyApp.MyThing{....etc...},
    %MyApp.MyThing{....etc...},
    %MyApp.MyThing{....etc...}
]

It then gives me argument error when my Repo returns an empty array:

SELECT .... FROM tasks AS i0 LIMIT $1 OFFSET $2 [5000, 5000]
** (ArgumentError) argument error
    :erlang.apply([], :load, [])
    (elasticsearch) lib/elasticsearch/storage/data_stream.ex:57: Elasticsearch.DataStream.load_page/4
    (elixir) lib/stream.ex:1361: Stream.do_resource/5
    (elixir) lib/stream.ex:1536: Enumerable.Stream.do_each/4
    (elixir) lib/enum.ex:1911: Enum.reduce/3
    (elasticsearch) lib/elasticsearch/indexing/bulk.ex:81: Elasticsearch.Index.Bulk.upload/4
    (elasticsearch) lib/elasticsearch/indexing/index.ex:33: Elasticsearch.Index.hot_swap/4
    (elasticsearch) lib/mix/elasticsearch.build.ex:57: Mix.Tasks.Elasticsearch.Build.build/3
    (elasticsearch) lib/mix/elasticsearch.build.ex:39: anonymous fn/3 in Mix.Tasks.Elasticsearch.Build.run/1
    (elixir) lib/enum.ex:1911: anonymous fn/3 in Enum.reduce/3
    (elixir) lib/enum.ex:3251: Enumerable.List.reduce/3
    (elixir) lib/enum.ex:1911: Enum.reduce/3
    (elasticsearch) lib/mix/elasticsearch.build.ex:37: Mix.Tasks.Elasticsearch.Build.run/1
    (mix) lib/mix/task.ex:314: Mix.Task.run_task/3
    (mix) lib/mix/cli.ex:80: Mix.CLI.run_task/2
    (elixir) lib/code.ex:677: Code.require_file/2

I am using your library inside an umbrella project so that might be something to do with it? I am a bit confused where to go from here! Ta!

Issue with dialyzer.

I'm new to elixir, so I might be wrong, but I think this line:

https://github.com/infinitered/elasticsearch-elixir/blob/master/lib/elasticsearch.ex#L20

Is wrong. The dyalizer is blaming me for a code like this:

{:ok, score} = Elasticsearch.post(ES.ElasticsearchCluster, "/user_score/score/#{userId}", scoreQuery(value))

With this error:

lib/.../user_state.ex:40:pattern_match
The pattern
{:ok, score}

can never match the type
{:error, _}

Timeout while fetching cluster config

exit: ** (exit) exited in: GenServer.call(MyApp.ElasticsearchCluster, :config, 5000)
    ** (EXIT) time out
  File "lib/gen_server.ex", line 836, in GenServer.call/3
  File "lib/elasticsearch.ex", line 307, in Elasticsearch.post/4

We get this error randomly in our app while making GET/POST requests to Elasticsearch.

Elasticsearch.post/4 calls Elasticsearch .Cluster.Config.get(cluster) here

And the cluster config is found via this - https://github.com/danielberkompas/elasticsearch-elixir/blob/master/lib/elasticsearch/cluster/cluster.ex#L172

Given the cluster is a GenServer and this call GenServer.call/3 can timeout since a single process can get overloaded under heavy load.

Can we move the :api out of the cluster config and the user can pass the Elasticsearch.API behaviour module to functions like Elasticsearch.get, Elasticsearch.post, etc?

@danielberkompas Let me know your thoughts.

Settings should maybe look in the current application's directory

This is more of an enhancement to help ease those who use Distillery, the problem is that the mappings in priv/elasticsearch won't match the right priv path when released into production, as something like:

indexes: %{
  albums: %{
    settings: "priv/elasticsearch/albums.json"
    # ...
  }
}

would not search in the currently running release directory.

I implemented a naive solution in andrewvy@b2bed7d which uses Application.app_dir/2 to get the running application's priv folder, but this loses the ability for those who may be pointing to paths outside of their application. Let me know if this is something worth adding! (Separate option, fallback lookup?)

Elasticsearch.put_document does not save to database

Can't get data to show on kibana.

I tried to save my data like this, not sure if i'm even doing this correctly:
Elasticsearch.put_document(MyApp.ElasticsearchCluster, struct(User, attrs), "users")

I'm using elasticsearch and kibana as docker services on docker-compose.

My configuration is as follows:


  elasticsearch:
    image: 'elasticsearch:6.5.4'
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data
  kibana:
    image: 'kibana:6.5.4'
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_URL=http://elasticsearch:9200

UPDATE:
I'm getting this error when I execute the command above
"Incorrect HTTP method for uri [/users/_doc] and method [PUT], allowed: [POST]"

UPDATE 2:
Did Elasticsearch.put_document(MyApp.ElasticsearchCluster, struct(User, attrs), "users/_doc")
Saving to database, but every subsequent put_document call just updates that one document in the index. How do i add new documents with different ids.

UPDATE 3:
Did Elasticsearch.put_document(MyApp.ElasticsearchCluster, %User{id: 1, ...}, "users")
Saves multiple documents and solved the problem.
Question is why does this struct require an id? I thought if the struct did not contain an id, elasticsearch will automatically generate one?

AWS Request Signature mismatch only for post query

My Config:

config :app, App.Elasticsearch.Cluster,
  url: "https://redacted.us-east-1.es.amazonaws.com",
  api: Elasticsearch.API.AWS,
  default_options: [
    aws: [
      region: "us-east-1",
      service: "es",
      access_key: "redacted",
      secret: "redacted"
    ]
  ],
  json_library: Jason

Getting a document:

IO.inspect(
      Elasticsearch.get(
        App.Elasticsearch.Cluster,
        "/movies/_doc/5"
      )
    )

{:ok,
 %{
   "_id" => "5",
   "_index" => "movies",
   "_primary_term" => 1,
   "_seq_no" => 0,
   "_source" => %{
     "director" => "Bennett Miller",
     "title" => "Moneyball",
     "year" => "2011"
   },
   "_type" => "_doc",
   "_version" => 1,
   "found" => true
 }}

My Query that does not work

index_pattern = "/employees-*/_search"
query = %{
      "suggest" => %{
        "employee-suggest" => %{
          "prefix" => "An",
          "completion" => %{
            "field" => "name_suggest",
            "size" => 10,
            "skip_duplicates" => true,
            "fuzzy" => %{
              "fuzziness" => 1
            }
          }
        }
      }
    }
IO.inspect(
Elasticsearch.post(
        App.Elasticsearch.Cluster,
        index_pattern,
        query
      )
)

{:error,
 %Elasticsearch.Exception{
   col: nil,
   line: nil,
   message: "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.",
   query: nil,
   raw: %{
     "message" => "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details."
   },
   status: nil,
   type: nil
 }}

Doing the same query with Postman does not fail like it does in the Elixir application. What is going on here?

Bulk upload a custom store / Store with params

I have a use case were I need to reindex a big part of the documents but not all. For example all restaurants from a channel. Since they can be thousands it's very slow to send documents one by one.
I would need to use bulk upload but being able to send a custom stream built on runtime.
There is no way at the moment to send params to the store when building the stream.
Can we add the option to send params to Bulk.upload to be sent to the store?
Or to expose at least a put_bulk method in Elasticsearch module that receives a list of items and perform a bulk upload?

Documentation references execute/1 which doesn't seem to exist

Deploying with releases

Elixir 1.9's releases are a built-in replacement for Distillery. Like Distillery, you can't run Mix tasks against a release.

I tried following the deployment guide in the docs but run into an error:

Starting dependencies...
Starting repos...
Starting clusters...
Building indexes...
** (exit) exited in: GenServer.call(Backend.Elasticsearch.Cluster, :config, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (elixir) lib/gen_server.ex:1000: GenServer.call/3
    (elasticsearch) lib/elasticsearch/indexing/index.ex:32: Elasticsearch.Index.hot_swap/2
    (elixir) lib/enum.ex:783: Enum."-each/2-lists^foreach/1-0-"/2
    (elixir) lib/enum.ex:783: Enum.each/2
    lib/backend/release.ex:29: Backend.Release.build_elasticsearch_indexes/0
    (stdlib) erl_eval.erl:680: :erl_eval.do_apply/6
    (elixir) lib/code.ex:240: Code.eval_string/3

This is the release module that I'm running:

defmodule Backend.Release do
  @app :backend
  @start_apps [
    :crypto,
    :ssl,
    :postgrex,
    :ecto,
    :elasticsearch
  ]

  # Ecto repos to start, if any
  @repos Application.get_env(:backend, :ecto_repos, [])
  # Elasticsearch clusters to start
  @clusters [Backend.Elasticsearch.Cluster]
  # Elasticsearch indexes to build
  @indexes [:instances]

  def build_elasticsearch_indexes() do
    start_services()
    IO.puts("Building indexes...")
    Enum.each(@indexes, &Elasticsearch.Index.hot_swap(Backend.Elasticsearch.Cluster, &1))
    stop_services()
  end

  # Ensure that all OTP apps, repos used by your Elasticsearch store,
  # and your Elasticsearch Cluster(s) are started
  defp start_services do
    IO.puts("Starting dependencies...")
    Enum.each(@start_apps, &Application.ensure_all_started/1)
    IO.puts("Starting repos...")
    Enum.each(@repos, & &1.start_link(pool_size: 1))
    IO.puts("Starting clusters...")
    Enum.each(@clusters, & &1.start_link())
  end

  defp stop_services do
    :init.stop()
  end
end

If I replace the contents of start_services() with a single line that calls Application.ensure_all_started(@app) then things work fine, but this starts my entire app which I'd prefer to avoid.

Does anyone know if there's a major difference between Distillery and Elixir releases that could be causing this? It seems like Enum.each(@clusters, & &1.start_link()) is not starting the cluster as it should.

Make clean_starting_with -> num_to_keep configurable

It is hardcoded to 2. This means that whenever I recreate the index I always have two copies of the index and automatically all the documents twice. This can lead to problems if we have a large dataset. Also many shared ES hosting solutions have limited number of indexes available.
Can you make optional if we want to keep the old version also?

Release 1.0.1

The last version released to Hex has been 1.0.0 from March 2019. Five PRs have been merged since; would it make sense to release these fixes to Hex?

Problem building

Hi everyone, I'm trying this problem here:

I've already tried using Elasticsearch with Jason and Poison and the error continues. You know what may be happening?

I completely cleared the Elasticsearch thinking there might be some duplicity, but it did not work.

The command I use is this:
mix elasticsearch.build dev-profiles --cluster V2V.ElasticsearchCluster

Error when elasticsearch url times out

Hi, thanks for this useful library!

I'm trying to use a url for a remote elasticsearch instance, and for whatever reason, the connection appears to be timing out.

When I call Elasticsearch.get for that configuration

I expect: it to return a status tuple like {:error, "connection timed out"}

What happens: exception was raised

    ** (Protocol.UndefinedError) protocol Enumerable not implemented for %HTTPoison.Error{id: nil, reason: :connect_timeout} of type HTTPoison.Error (a struct). This protocol is implemented for the following type(s): Ecto.Adapters.SQL.Stream, Postgrex.Stream, DBConnection.Stream, DBConnection.PrepareStream, HashSet, Range, Map, Function, List, Stream, Date.Range, HashDict, GenEvent.Stream, MapSet, File.Stream, IO.Stream
        (elixir) lib/enum.ex:1: Enumerable.impl_for!/1
        (elixir) lib/enum.ex:141: Enumerable.reduce/3
        (elixir) lib/enum.ex:3023: Enum.reverse/1
        (elixir) lib/enum.ex:2668: Enum.to_list/1
        (absinthe) lib/absinthe/phase/document/execution/resolution.ex:351: Absinthe.Phase.Document.Execution.Resolution.split_error_value/1
        (absinthe) lib/absinthe/phase/document/execution/resolution.ex:341: Absinthe.Phase.Document.Execution.Resolution.put_result_error_value/5
        (elixir) lib/enum.ex:1948: Enum."-reduce/3-lists^foldl/2-0-"/3
        (absinthe) lib/absinthe/phase/document/execution/resolution.ex:256: Absinthe.Phase.Document.Execution.Resolution.build_result/4
        (absinthe) lib/absinthe/phase/document/execution/resolution.ex:153: Absinthe.Phase.Document.Execution.Resolution.do_resolve_fields/6
        (absinthe) lib/absinthe/phase/document/execution/resolution.ex:72: Absinthe.Phase.Document.Execution.Resolution.walk_result/5
        (absinthe) lib/absinthe/phase/document/execution/resolution.ex:53: Absinthe.Phase.Document.Execution.Resolution.perform_resolution/3
        (absinthe) lib/absinthe/phase/document/execution/resolution.ex:24: Absinthe.Phase.Document.Execution.Resolution.resolve_current/3
        (absinthe) lib/absinthe/pipeline.ex:274: Absinthe.Pipeline.run_phase/3
        (absinthe_plug) lib/absinthe/plug.ex:421: Absinthe.Plug.run_query/4
        (absinthe_plug) lib/absinthe/plug.ex:247: Absinthe.Plug.call/2
        (phoenix) lib/phoenix/router/route.ex:40: Phoenix.Router.Route.call/2
        (phoenix) lib/phoenix/router.ex:288: Phoenix.Router.__call__/2
        (ostraka) lib/ostraka_web/endpoint.ex:1: OstrakaWeb.Endpoint.plug_builder_call/2
        (ostraka) lib/ostraka_web/endpoint.ex:1: OstrakaWeb.Endpoint.call/2
        (phoenix) lib/phoenix/endpoint/cowboy2_handler.ex:42: Phoenix.Endpoint.Cowboy2Handler.init/4

"elasticsearch": {:hex, :elasticsearch, "1.0.0", "626d3fb8e7554d9c93eb18817ae2a3d22c2a4191cc903c4644b1334469b15374", [:mix], [{:httpoison, ">= 0.0.0", [hex: :httpoison, repo: "hexpm", optional: false]}, {:poison, ">= 0.0.0", [hex: :poison, repo: "hexpm", optional: true]}, {:sigaws, "~> 0.7", [hex: :sigaws, repo: "hexpm", optional: true]}, {:vex, "~> 0.6.0", [hex: :vex, repo: "hexpm", optional: false]}], "hexpm"},

Elasticsearch.StreamingStore behaviour or something alike

Reason being we could utilize Repo.stream with Repo.transaction with timeout of infinity.

LIMIT + OFFSET is linear when getting the last 100 in a 1 million row table. I'll have to go through the first 99900. Using a cursor or a stream with a timeout of infinity can help in this case.

Right now I avoid having long queries (waiting for the offset to reach 99900) by doing something like this:

        User
        |> select([:name, :email, :phone, :id])
        |> Repo.stream()
        |> Stream.drop(offset)
        |> Enum.take(limit)

But streaming to the end in one shot would be much much preferred.

bulk_wait_interval is unused

I can't see any reference to where bulk_wait_interval is being used. Is there a planned future use of it? If not, would you accept a PR to update the README?

Deleting twice a document raises exception.

The following exception is raised when deleting twice a document from an index:

** (FunctionClauseError) no function clause matching in Elasticsearch.Exception.build/2
    lib/elasticsearch/exception.ex:35: Elasticsearch.Exception.build(%{"_id" => "54", "_index" => "listings-1525371175", "_primary_term" => 1, "_seq_no" => 42, "_shards" => %{"failed" => 0, "successful" => 1, "total" => 2}, "_type" => "_doc", "_version" => 14, "result" => "not_found"}, nil)
    lib/elasticsearch/exception.ex:22: Elasticsearch.Exception.exception/1
    lib/elasticsearch.ex:389: Elasticsearch.format/1
    (re) lib/re_web/search/server.ex:57: ReWeb.Search.Server.handle_cast/2
    (stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:686: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

Seems like Elasticsearech.Exception.build/2 is not expecting "result" => "not_found" as a result.
Elasticsearch version: 6.2.4

question: Is custom index aliasing and searching possible?

Hi, I am thinking of using this library for my ES integration.

I am going to cluster my indexes as posts-{region} to scope them and search within those scopes.

posts-americas -> would host and be searchable for Americas region
posts-europe -> similary just for europe posts

and then I could delete the whole index to get rid of sort of a dynamic region.

Looking at the documentation I see this

 # You should configure each index which you maintain in Elasticsearch here.
  # This configuration will be read by the `mix elasticsearch.build` task,
  # described below.
  indexes: %{
    # This is the base name of the Elasticsearch index. Each index will be
    # built with a timestamp included in the name, like "posts-5902341238".
    # It will then be aliased to "posts" for easy querying.
    posts: %{

First: is this possible from the usage of this library or would I have to modify the task itself?

Second: Would the following return an %MyApp.Post{} struct or a raw ES response with the meta-information about the results too?

Elasticsearch.post(MyApp.ElasticsearchCluster, "/posts-americas/_doc/_search", '{"query": {"match_all": {}}}')

Thank you in advance for the library!

Allow bulk configs to be overridable via mix tasks

config :recommender, Recommender.Elasticsearch.Cluster,
  # When indexing data using the `mix elasticsearch.build` task,
  # control the data ingestion rate by raising or lowering the number
  # of items to send in each bulk request.
  bulk_page_size: 5000,

  # Likewise, wait a given period between posting pages to give
  # Elasticsearch time to catch up.
  # 15 seconds
  bulk_wait_interval: 15_000,

Something like this:

mix elasticsearch.build users --cluster Recommender.Elasticsearch.Cluster --wait-interval 15000 --page-size 5000

On top of that. Allowing it to be configured per index level would also be nice.

Bulk load performance and timeout, HTTPoison options, HTTP request timeout option

When bulk loading a large number of documents, in can be tricky to get the bulk_page_size, bulk_wait_interval correct, without significantly slowing down an index build.

Also, in some cases the HTTP request takes longer than the default of 8000ms resulting in a timeout.

Digging into the code, there is a default_opts: configuration option to pass options to HTTPoison.request, but this doesn't seem to be documented. Setting this to [recv_timeout: 20000] (or something large) and setting bulk_wait_interval to zero gives much better performance, since it's waiting at most 20 seconds but only as long as the request needs. For me at least this seemed to eliminate the need to wait a fixed time between requests.

It would be good to at least document the default_opts: option, or maybe even make configuration of the HTTP request timeout a first-class configuration option.

Example of working app

Could you add a full working Phoenix example with small Ecto model having some relations? I can't figure out how to use this library.

Support Elasticsearch 6.x+

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/removal-of-types.html

Get document on a given index

%{"error" => %{"code" => 404, "message" => "elastic: Error 404 (Not Found)", "status" => "Not Found"}}

So I'm having this error when trying to delete a not existing document on an index using delete_document. I am expecting this function to return an error tuple: {:error, %Elasticsearch.Exception{...} but It is not doing it on production (locally and on staging env it's working).

As a workaround, I am thinking I should check if a document is existing first before doing anything, does this package have a way to check that? like a get_document function? Thanks!

And also how do you get the raw data of an index?

Comparison with elastix?

Thanks for this library. I'm evaluating elasticsearch-elixir and elastix for a production app and was wondering if you have any thoughts, however brief, on how these libraries might compare? Might even be worth adding to README at some point.

danielberkompas / elasticsearch-elixir Goto Github PK

elasticsearch-elixir's Issues

Recommend Projects

Recommend Topics

Recommend Org