danielberkompas / elasticsearch-elixir Goto Github PK
View Code? Open in Web Editor NEWNo-nonsense Elasticsearch library for Elixir
License: MIT License
No-nonsense Elasticsearch library for Elixir
License: MIT License
I’m currently using elasticsearch-elixir and I extended Elasticsearch.API to use AWS Elasticsearch Service with signed requests. I wonder if this is a feature you’d like to include in this project, I could make a PR if interested. What do you think?
Here, I'm building the indexes when the application starts using mix
task
How can we reload the index as soon as a new record inserted into DB?
Do we have to stop application and rebuild the indexes here?
Glad if we have any automation here, unfortunately I did not find any other than tasks.
Thanks :)
In your readme, the configuration section specified a HTTPoison configuration. Is this required for the app to run?
When a document is not found via the Get API the following exception is raised -
iex(1)> Elasticsearch.get(MyApp.ElasticsearchCluster, "/index-name/_doc/123")
** (FunctionClauseError) no function clause matching in Elasticsearch.Exception.build/2
The following arguments were given to Elasticsearch.Exception.build/2:
# 1
%{
"_id" => "123",
"_index" => "index-name",
"_type" => "_doc",
"found" => false
}
# 2
nil
Attempted function clauses (showing 4 out of 4):
defp build(%{"error" => error} = response, query) when is_map(error)
defp build(%{"error" => error}, query) when is_binary(error)
defp build(%{"result" => type}, query)
defp build(error, query) when is_binary(error)
(elasticsearch) lib/elasticsearch/exception.ex:35: Elasticsearch.Exception.build/2
(elasticsearch) lib/elasticsearch/exception.ex:22: Elasticsearch.Exception.exception/1
(elasticsearch) lib/elasticsearch.ex:389: Elasticsearch.format/1
I'm setting the executable in the application supervision tree:
worker(Elasticsearch.Executable, [
"Elasticsearch",
"./vendor/elasticsearch/bin/elasticsearch",
9200
], id: :elasticsearch),
and it's giving the following error:
sh: ~/myapp/_build/dev/lib/elasticsearch/priv/bin/wrap: No such file or directory
sh: line 0: exec: ~/myapp/_build/dev/lib/elasticsearch/priv/bin/wrap: cannot execute: No such file or directory
2018-04-24T18:01:18.863760Z [info] Application re exited: Re.Application.start(:normal, []) returned an error: shutdown: failed to start child: :elasticsearch
** (EXIT) an exception was raised:
** (MatchError) no match of right hand side value: nil
lib/elasticsearch/executable.ex:33: Elasticsearch.Executable.init/1
(stdlib) gen_server.erl:365: :gen_server.init_it/2
(stdlib) gen_server.erl:333: :gen_server.init_it/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Inspecting the deps
folder, seems like priv/bin/wrap
is not "shipped".
Manually adding it copying from github seems to solve the problem.
Hello people.
Is it possible for me to set the number of indexes that will be saved when I build? I have several versions and would like to exclude them automatically when I run the build.
Elasticsearch.ex specifies a response type:
@type response :: {:ok, map} | {:error, Elasticsearch.Exception.t()}
but there is no type attribute in Elasticsearch.Exception
.
It is unclear to me if it's possible to delete with a body with the current API. Elasticsearch.post//4
allows setting a body, but not Elasticsearch.delete/3
.
My use-case is deleting a search context (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-scroll.html#_clear_scroll_api).
** (ArgumentError) argument error
(stdlib 3.8) :ets.lookup(Util.ElasticsearchCluster.Config, :config)
(elasticsearch 1.0.1) lib/elasticsearch/cluster/cluster.ex:218: Elasticsearch.Cluster.read_config/1
(elasticsearch 1.0.1) lib/elasticsearch.ex:286: Elasticsearch.put/4
(util 0.1.0) lib/cli.ex:6: Util.CLI.main/1
(elixir 1.11.3) lib/kernel/cli.ex:124: anonymous fn/3 in Kernel.CLI.exec_fun/2
---config.exs
use Mix.Config
config :util, Util.ElasticsearchCluster,
url: "http://localhost:9200"
Currently, elasticsearech.build
is only a mix task and is not being made available through iex
(mix
dir is not in elixirc_paths/1
on mix.exs
).
Would be interesting to extract to a module and be able to call through iex
.
I'm using towel
package which has Maybe
module - just like maybe
package.
As a result generating release fails:
cmd: MIX_ENV=prod mix release --quiet
stdout: ==> Release failed, during .boot generation:
Duplicated modules:
'Elixir.Maybe' specified in towel and maybe
It looks like maybe
is used in elasticsearch-elixir
only once:
https://github.com/infinitered/elasticsearch-elixir/blob/master/lib/mix/elasticsearch.build.ex#L101
So it can be safely removed and replaced, say, with Kernel.get_in
.
I understand that it's a rare case and it's unlikely that anyone else will run into this issue but still IMO it's always a good idea to remove dependency - especially if it can be easily replaced with functions from standard library.
As I am trying to transition to Elasticsearch versions 7.x from 6.3.2, I am having trouble building an index with "mix elasticsearch.build posts --cluster MyApp.ElasticsearchCluster". Could you post a sample file or direct me to one ? Thanks.
This option will append data to an existing index rather than rebuilding it from scratch. Needs a spec.
In version 0.4.0 it was possible to use a load function as can be seen in the link below. But this option has been removed. I believe we will need to have the load
function again. I'll explain below.
https://github.com/infinitered/elasticsearch-elixir/blob/58948d14a5806d76aa469702c39eef8643738ac4/guides/upgrading/0.4.x_to_0.5.x.md
In Ecto 3.0, support for preload along with Repo.stream has been removed as can be seen on these links:
https://elixirforum.com/t/repo-stream-with-preload-new-warning/17043
elixir-ecto/ecto@6655a9a#diff-122d0a4bbce6a65cc1523584a00193aaR138)
Without the option to preload next to the stream, there is only the option of preloading within Elasticsearch.Document, which is bad because it would have to preload every record, that is, we would lose the preload in batch.
I think that including again the load option, we could go back to doing things like:
def load (schema, offset, limit)
schema
|> offset (^ offset)
|> limit (^ limit)
|> Repo.all ()
|> Repo.preload ([: ad,: address,: medias])
end
I do not know if I could explain. If you have any doubt, I can try to give you more examples.
Hello all,
I am using elasticsearch-elixir to bulk insert a database from a CSV file. Everything works except the memory keeps growing until Beam crashes due to insufficient free memory. If I use a similar stream in Elixir without using elasticsearch-elixir I am able to go though the collection (and count something for example). I would think that that the old CSV data (already indexed) would not have to be kept in memory, but for some reason this is what is happening.
Is there a way to bulk insert half of the data and then continue (e.g. without creating a new version of the index)? Or perhaps this is bug in elasticsearch-elixir?
When using put_document I received the following error:
** (MatchError) no match of right hand side value: {:error, %Elasticsearch.Exception{col: nil, line: nil, message: "Document mapping type name can't start with '_', found: [_doc]", query: nil, raw: %{"error" => %{"reason" => "Document mapping type name can't start with '_', found: [_doc]", "root_cause" => [%{"reason" => "Document mapping type name can't start with '_', found: [_doc]", "type" => "invalid_type_name_exception"}], "type" => "invalid_type_name_exception"}, "status" => 400}, status: 400, type: "invalid_type_name_exception"}}
My Elasticsearch version is 6.5.2.
I checked that by default document_url is using "_doc" and it seems that it will no longer be possible to start with "_" the type. Is there a way to solve without changing the code?
There may be documents that do not pass ES validation. For example self intersecting polygons.
This leads hot_swap failying since every step needs to pass successfully:
with :ok <- create_from_file(config, name, settings_file),
:ok <- Bulk.upload(config, name, index_config),
:ok <- __MODULE__.alias(config, name, alias),
:ok <- clean_starting_with(config, alias, 2),
:ok <- refresh(config, name) do
:ok
I can not guarantee that all the documents will always be valid but I would want to reindex the others and inform at the end that some documents could not be inserted.
I had to rewrite bulk upload in my code to pass this limitation:
with :ok <- Index.create_from_file(config, name, settings_file),
bulk_upload(config, name, index_config),
:ok <- Index.alias(config, name, alias),
:ok <- Index.clean_starting_with(config, alias, 2),
:ok <- Index.refresh(config, name) do
:ok
.....
defp bulk_upload(config, name, index_config) do
case Bulk.upload(config, name, index_config) do
:ok ->
:ok
{:error, errors} = err ->
Bugsnag.report(
ElasticsearchError.exception("Errors encountered indexing restaurants"),
severity: "warn",
metadata: %{errors: errors}
)
err
end
end
My question is can we make this bulk upload step not failing in case there are some errors?
I'm getting the follwing error:
$ Elasticsearch.get("/_cat/health")
13:51:37.261 [info] ['TLS', 32, 'client', 58, 32, 73, 110, 32, 115, 116, 97, 116, 101, 32, 'certify', 32, 'at ssl_handshake.erl:1624 generated CLIENT ALERT: Fatal - Unknown CA', 10]
{:error, %HTTPoison.Error{id: nil, reason: {:tls_alert, 'unknown ca'}}}
When I run the following query, it runs successfully. Ref
$ HTTPoison.get("https://servername:9200/_cat/health", [], [ ssl: [{:versions, [:'tlsv1.2']}] ])
Can we set some Env var
or set some config value
to solve this issue ?
I tried the following in config.exs
file with no good result.
config :ssl, protocol_version: :"tlsv1.2"
When I try to run with distillery command:
Building elasticsearch indexes for app
** (exit) exited in: GenServer.call(Mix.ProjectStack, {:get, #Function<11.107724793/1 in Mix.ProjectStack.peek/0>}, 30000)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(elixir) lib/gen_server.ex:914: GenServer.call/3
lib/mix/project.ex:155: Mix.Project.get/0
lib/mix/task.ex:274: Mix.Task.run/2
(elasticsearch) lib/mix/elasticsearch.build.ex:49: Mix.Tasks.Elasticsearch.Build.run/1
(app) lib/mix/tasks/app.task.ex:8: Mix.Tasks.App.Task.run/1
lib/mix/lib/releases/runtime/control.ex:717: Mix.Releases.Runtime.Control.eval/2
lib/entry.ex:44: Mix.Releases.Runtime.Control.main/1
(stdlib) erl_eval.erl:677: :erl_eval.do_apply/6
I think, the problem is Mix.Task.run("app.start", [])
. Maybe if the elasticsearch.build.ex
contains only the code to build/index. Then, we can create our task(with our start_app) that calls Mix.Tasks.Elasticsearch.Build.run..
Any idea? Thanks!
When compiling my Elixir (1.6.5) project with elasticsearch-elixir installed, I run into this error:
Compiling 15 files (.ex)
== Compilation error in file lib/mix/elasticsearch.build.ex ==
** (CompileError) lib/mix/elasticsearch.build.ex:92: undefined function maybe/2
(stdlib) lists.erl:1338: :lists.foreach/2
(stdlib) erl_eval.erl:670: :erl_eval.do_apply/6
could not compile dependency :elasticsearch, "mix compile" failed. You can recompile this dependency with "mix deps.compile elasticsearch", update it with "mix deps.update elasticsearch" or clean it with "mix deps.clean elasticsearch"
Also I love this project!!!!
The doc blocks for Index.create_from_file/2
include a sample mapping for elasticsearch where it uses type=string
.. which fails on recent versions, since it got deprecated almost two years ago in 5.x:
With the release of Elasticsearch 5.0 coming closer, it is time to introduce one of the release highlights of this upcoming release: the removal of the string type.
since 5.1.1 is explicitly referenced in bin/setup
i'm hesitating to "just change" type=string
to type=text
in the doc block, which works in 6.2.3 w/o actually upgrading the setup
script as well ..
WDYT @danielberkompas ?
Some applications might want to talk to multiple separate Elasticsearch stores. The current configuration strategy assumes that you only have one Elasticsearch endpoint.
We should take a page from Ecto's book and allow you to configure multiple Elasticsearch "Repos", something like this:
defmodule MyApp.Elasticsearch do
use Elasticsearch, otp_app: :my_app
end
You'd then configure MyApp.Elasticsearch
like we do now. This would allow a single app (or separate apps within Umbrella apps) to talk to multiple endpoints.
Library version: 0.5.1
While trying to run the elasticsearch.build
task for the first time in the new project against the server from Elastic Cloud.
The error is:
** (Mix) Index resources could not be created.
%Elasticsearch.Exception{col: nil, line: nil,
message: "forcemerge takes arguments in query parameters, not in the request body",
query: nil,
raw: %{
"error" => %{
"reason" => "forcemerge takes arguments in query parameters, not in the request body",
"root_cause" => [%{"reason" => "forcemerge takes arguments in query parameters, not in the request body", "type" => "illegal_argument_exception"}],
"type" => "illegal_argument_exception"
},
"status" => 400
},
status: 400,
type: "illegal_argument_exception"
}
I don't have much experience with ElasticSearch, but it seems to me that the culprit is this call:
Where it passes an empty JSON document as argument instead of passing no body at all, but I'm not 100% sure.
In one of our production systems, I am seeing the following in the logs hundreds of times. Should this exception be handled by the Elasticsearch Elixir client? The original function call is Elasticsearch.put_document
.
(elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
** (FunctionClauseError) no function clause matching in Elasticsearch.Exception.build/2
(elasticsearch) lib/elasticsearch/exception.ex:35: Elasticsearch.Exception.build(%{"message" => nil}, nil)
(elasticsearch) lib/elasticsearch/exception.ex:22: Elasticsearch.Exception.exception/1
(elasticsearch) lib/elasticsearch.ex:389: Elasticsearch.format/1
mix elasticsearch.install vendor --version 7.6.0
It tries to curl https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.0.tar.gz
But the file is in:
If the library supports only elasticsearch 6, it should be explicit in the README
I'm trying to install on Ubuntu 18, but this bug is appearing when installing Kibana. Elasticsearch is installed successfully.
mix elasticsearch.install . --version 6.2.4
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.7M 100 27.7M 0 0 1392k 0 0:00:20 0:00:20 --:--:-- 1976k
** (Mix) Unsupported system for Kibana: {:unix, :linux}
So I have implemented the Store, Document Protocol and a mapping file and I'm nearly getting things working with your great library.
Unfortunately it seems to be calling the Elasticsearch.Store load function twice even though there are fewer than 5000 entries for the first index. I would have thought that it would never try to index again returned values < bulk_insert_size is the number of entries returned?
Returned from Elasticsearch.Store:
[
%MyApp.MyThing{....etc...},
%MyApp.MyThing{....etc...},
%MyApp.MyThing{....etc...}
]
It then gives me argument error when my Repo returns an empty array:
SELECT .... FROM tasks AS i0 LIMIT $1 OFFSET $2 [5000, 5000]
** (ArgumentError) argument error
:erlang.apply([], :load, [])
(elasticsearch) lib/elasticsearch/storage/data_stream.ex:57: Elasticsearch.DataStream.load_page/4
(elixir) lib/stream.ex:1361: Stream.do_resource/5
(elixir) lib/stream.ex:1536: Enumerable.Stream.do_each/4
(elixir) lib/enum.ex:1911: Enum.reduce/3
(elasticsearch) lib/elasticsearch/indexing/bulk.ex:81: Elasticsearch.Index.Bulk.upload/4
(elasticsearch) lib/elasticsearch/indexing/index.ex:33: Elasticsearch.Index.hot_swap/4
(elasticsearch) lib/mix/elasticsearch.build.ex:57: Mix.Tasks.Elasticsearch.Build.build/3
(elasticsearch) lib/mix/elasticsearch.build.ex:39: anonymous fn/3 in Mix.Tasks.Elasticsearch.Build.run/1
(elixir) lib/enum.ex:1911: anonymous fn/3 in Enum.reduce/3
(elixir) lib/enum.ex:3251: Enumerable.List.reduce/3
(elixir) lib/enum.ex:1911: Enum.reduce/3
(elasticsearch) lib/mix/elasticsearch.build.ex:37: Mix.Tasks.Elasticsearch.Build.run/1
(mix) lib/mix/task.ex:314: Mix.Task.run_task/3
(mix) lib/mix/cli.ex:80: Mix.CLI.run_task/2
(elixir) lib/code.ex:677: Code.require_file/2
I am using your library inside an umbrella project so that might be something to do with it? I am a bit confused where to go from here! Ta!
I'm new to elixir, so I might be wrong, but I think this line:
https://github.com/infinitered/elasticsearch-elixir/blob/master/lib/elasticsearch.ex#L20
Is wrong. The dyalizer is blaming me for a code like this:
{:ok, score} = Elasticsearch.post(ES.ElasticsearchCluster, "/user_score/score/#{userId}", scoreQuery(value))
With this error:
lib/.../user_state.ex:40:pattern_match
The pattern
{:ok, score}
can never match the type
{:error, _}
exit: ** (exit) exited in: GenServer.call(MyApp.ElasticsearchCluster, :config, 5000)
** (EXIT) time out
File "lib/gen_server.ex", line 836, in GenServer.call/3
File "lib/elasticsearch.ex", line 307, in Elasticsearch.post/4
We get this error randomly in our app while making GET
/POST
requests to Elasticsearch.
Elasticsearch.post/4
calls Elasticsearch .Cluster.Config.get(cluster)
here
And the cluster config is found via this - https://github.com/danielberkompas/elasticsearch-elixir/blob/master/lib/elasticsearch/cluster/cluster.ex#L172
Given the cluster is a GenServer and this call GenServer.call/3
can timeout since a single process can get overloaded under heavy load.
Can we move the :api
out of the cluster config and the user can pass the Elasticsearch.API
behaviour module to functions like Elasticsearch.get
, Elasticsearch.post
, etc?
@danielberkompas Let me know your thoughts.
This is more of an enhancement to help ease those who use Distillery, the problem is that the mappings in priv/elasticsearch
won't match the right priv path when released into production, as something like:
indexes: %{
albums: %{
settings: "priv/elasticsearch/albums.json"
# ...
}
}
would not search in the currently running release directory.
I implemented a naive solution in andrewvy@b2bed7d which uses Application.app_dir/2
to get the running application's priv folder, but this loses the ability for those who may be pointing to paths outside of their application. Let me know if this is something worth adding! (Separate option, fallback lookup?)
Can't get data to show on kibana.
I tried to save my data like this, not sure if i'm even doing this correctly:
Elasticsearch.put_document(MyApp.ElasticsearchCluster, struct(User, attrs), "users")
I'm using elasticsearch and kibana as docker services on docker-compose.
My configuration is as follows:
elasticsearch:
image: 'elasticsearch:6.5.4'
ports:
- "9200:9200"
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
kibana:
image: 'kibana:6.5.4'
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_URL=http://elasticsearch:9200
UPDATE:
I'm getting this error when I execute the command above
"Incorrect HTTP method for uri [/users/_doc] and method [PUT], allowed: [POST]"
UPDATE 2:
Did Elasticsearch.put_document(MyApp.ElasticsearchCluster, struct(User, attrs), "users/_doc")
Saving to database, but every subsequent put_document call just updates that one document in the index. How do i add new documents with different ids.
UPDATE 3:
Did Elasticsearch.put_document(MyApp.ElasticsearchCluster, %User{id: 1, ...}, "users")
Saves multiple documents and solved the problem.
Question is why does this struct require an id? I thought if the struct did not contain an id, elasticsearch will automatically generate one?
My Config:
config :app, App.Elasticsearch.Cluster,
url: "https://redacted.us-east-1.es.amazonaws.com",
api: Elasticsearch.API.AWS,
default_options: [
aws: [
region: "us-east-1",
service: "es",
access_key: "redacted",
secret: "redacted"
]
],
json_library: Jason
Getting a document:
IO.inspect(
Elasticsearch.get(
App.Elasticsearch.Cluster,
"/movies/_doc/5"
)
)
{:ok,
%{
"_id" => "5",
"_index" => "movies",
"_primary_term" => 1,
"_seq_no" => 0,
"_source" => %{
"director" => "Bennett Miller",
"title" => "Moneyball",
"year" => "2011"
},
"_type" => "_doc",
"_version" => 1,
"found" => true
}}
My Query that does not work
index_pattern = "/employees-*/_search"
query = %{
"suggest" => %{
"employee-suggest" => %{
"prefix" => "An",
"completion" => %{
"field" => "name_suggest",
"size" => 10,
"skip_duplicates" => true,
"fuzzy" => %{
"fuzziness" => 1
}
}
}
}
}
IO.inspect(
Elasticsearch.post(
App.Elasticsearch.Cluster,
index_pattern,
query
)
)
{:error,
%Elasticsearch.Exception{
col: nil,
line: nil,
message: "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.",
query: nil,
raw: %{
"message" => "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details."
},
status: nil,
type: nil
}}
Doing the same query with Postman does not fail like it does in the Elixir application. What is going on here?
I have a use case were I need to reindex a big part of the documents but not all. For example all restaurants from a channel. Since they can be thousands it's very slow to send documents one by one.
I would need to use bulk upload but being able to send a custom stream built on runtime.
There is no way at the moment to send params to the store when building the stream.
Can we add the option to send params to Bulk.upload to be sent to the store?
Or to expose at least a put_bulk method in Elasticsearch module that receives a list of items and perform a bulk upload?
Elixir 1.9's releases are a built-in replacement for Distillery. Like Distillery, you can't run Mix tasks against a release.
I tried following the deployment guide in the docs but run into an error:
Starting dependencies...
Starting repos...
Starting clusters...
Building indexes...
** (exit) exited in: GenServer.call(Backend.Elasticsearch.Cluster, :config, 5000)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(elixir) lib/gen_server.ex:1000: GenServer.call/3
(elasticsearch) lib/elasticsearch/indexing/index.ex:32: Elasticsearch.Index.hot_swap/2
(elixir) lib/enum.ex:783: Enum."-each/2-lists^foreach/1-0-"/2
(elixir) lib/enum.ex:783: Enum.each/2
lib/backend/release.ex:29: Backend.Release.build_elasticsearch_indexes/0
(stdlib) erl_eval.erl:680: :erl_eval.do_apply/6
(elixir) lib/code.ex:240: Code.eval_string/3
This is the release module that I'm running:
defmodule Backend.Release do
@app :backend
@start_apps [
:crypto,
:ssl,
:postgrex,
:ecto,
:elasticsearch
]
# Ecto repos to start, if any
@repos Application.get_env(:backend, :ecto_repos, [])
# Elasticsearch clusters to start
@clusters [Backend.Elasticsearch.Cluster]
# Elasticsearch indexes to build
@indexes [:instances]
def build_elasticsearch_indexes() do
start_services()
IO.puts("Building indexes...")
Enum.each(@indexes, &Elasticsearch.Index.hot_swap(Backend.Elasticsearch.Cluster, &1))
stop_services()
end
# Ensure that all OTP apps, repos used by your Elasticsearch store,
# and your Elasticsearch Cluster(s) are started
defp start_services do
IO.puts("Starting dependencies...")
Enum.each(@start_apps, &Application.ensure_all_started/1)
IO.puts("Starting repos...")
Enum.each(@repos, & &1.start_link(pool_size: 1))
IO.puts("Starting clusters...")
Enum.each(@clusters, & &1.start_link())
end
defp stop_services do
:init.stop()
end
end
If I replace the contents of start_services()
with a single line that calls Application.ensure_all_started(@app)
then things work fine, but this starts my entire app which I'd prefer to avoid.
Does anyone know if there's a major difference between Distillery and Elixir releases that could be causing this? It seems like Enum.each(@clusters, & &1.start_link())
is not starting the cluster as it should.
It is hardcoded to 2. This means that whenever I recreate the index I always have two copies of the index and automatically all the documents twice. This can lead to problems if we have a large dataset. Also many shared ES hosting solutions have limited number of indexes available.
Can you make optional if we want to keep the old version also?
The last version released to Hex has been 1.0.0 from March 2019. Five PRs have been merged since; would it make sense to release these fixes to Hex?
Hi everyone, I'm trying this problem here:
I've already tried using Elasticsearch with Jason and Poison and the error continues. You know what may be happening?
I completely cleared the Elasticsearch thinking there might be some duplicity, but it did not work.
The command I use is this:
mix elasticsearch.build dev-profiles --cluster V2V.ElasticsearchCluster
Hi, thanks for this useful library!
I'm trying to use a url for a remote elasticsearch instance, and for whatever reason, the connection appears to be timing out.
When I call Elasticsearch.get
for that configuration
I expect: it to return a status tuple like {:error, "connection timed out"}
What happens: exception was raised
** (Protocol.UndefinedError) protocol Enumerable not implemented for %HTTPoison.Error{id: nil, reason: :connect_timeout} of type HTTPoison.Error (a struct). This protocol is implemented for the following type(s): Ecto.Adapters.SQL.Stream, Postgrex.Stream, DBConnection.Stream, DBConnection.PrepareStream, HashSet, Range, Map, Function, List, Stream, Date.Range, HashDict, GenEvent.Stream, MapSet, File.Stream, IO.Stream
(elixir) lib/enum.ex:1: Enumerable.impl_for!/1
(elixir) lib/enum.ex:141: Enumerable.reduce/3
(elixir) lib/enum.ex:3023: Enum.reverse/1
(elixir) lib/enum.ex:2668: Enum.to_list/1
(absinthe) lib/absinthe/phase/document/execution/resolution.ex:351: Absinthe.Phase.Document.Execution.Resolution.split_error_value/1
(absinthe) lib/absinthe/phase/document/execution/resolution.ex:341: Absinthe.Phase.Document.Execution.Resolution.put_result_error_value/5
(elixir) lib/enum.ex:1948: Enum."-reduce/3-lists^foldl/2-0-"/3
(absinthe) lib/absinthe/phase/document/execution/resolution.ex:256: Absinthe.Phase.Document.Execution.Resolution.build_result/4
(absinthe) lib/absinthe/phase/document/execution/resolution.ex:153: Absinthe.Phase.Document.Execution.Resolution.do_resolve_fields/6
(absinthe) lib/absinthe/phase/document/execution/resolution.ex:72: Absinthe.Phase.Document.Execution.Resolution.walk_result/5
(absinthe) lib/absinthe/phase/document/execution/resolution.ex:53: Absinthe.Phase.Document.Execution.Resolution.perform_resolution/3
(absinthe) lib/absinthe/phase/document/execution/resolution.ex:24: Absinthe.Phase.Document.Execution.Resolution.resolve_current/3
(absinthe) lib/absinthe/pipeline.ex:274: Absinthe.Pipeline.run_phase/3
(absinthe_plug) lib/absinthe/plug.ex:421: Absinthe.Plug.run_query/4
(absinthe_plug) lib/absinthe/plug.ex:247: Absinthe.Plug.call/2
(phoenix) lib/phoenix/router/route.ex:40: Phoenix.Router.Route.call/2
(phoenix) lib/phoenix/router.ex:288: Phoenix.Router.__call__/2
(ostraka) lib/ostraka_web/endpoint.ex:1: OstrakaWeb.Endpoint.plug_builder_call/2
(ostraka) lib/ostraka_web/endpoint.ex:1: OstrakaWeb.Endpoint.call/2
(phoenix) lib/phoenix/endpoint/cowboy2_handler.ex:42: Phoenix.Endpoint.Cowboy2Handler.init/4
"elasticsearch": {:hex, :elasticsearch, "1.0.0", "626d3fb8e7554d9c93eb18817ae2a3d22c2a4191cc903c4644b1334469b15374", [:mix], [{:httpoison, ">= 0.0.0", [hex: :httpoison, repo: "hexpm", optional: false]}, {:poison, ">= 0.0.0", [hex: :poison, repo: "hexpm", optional: true]}, {:sigaws, "~> 0.7", [hex: :sigaws, repo: "hexpm", optional: true]}, {:vex, "~> 0.6.0", [hex: :vex, repo: "hexpm", optional: false]}], "hexpm"},
Reason being we could utilize Repo.stream
with Repo.transaction
with timeout of infinity.
LIMIT + OFFSET is linear when getting the last 100 in a 1 million row table. I'll have to go through the first 99900. Using a cursor or a stream with a timeout of infinity can help in this case.
Right now I avoid having long queries (waiting for the offset to reach 99900) by doing something like this:
User
|> select([:name, :email, :phone, :id])
|> Repo.stream()
|> Stream.drop(offset)
|> Enum.take(limit)
But streaming to the end in one shot would be much much preferred.
I can't see any reference to where bulk_wait_interval
is being used. Is there a planned future use of it? If not, would you accept a PR to update the README?
The following exception is raised when deleting twice a document from an index:
** (FunctionClauseError) no function clause matching in Elasticsearch.Exception.build/2
lib/elasticsearch/exception.ex:35: Elasticsearch.Exception.build(%{"_id" => "54", "_index" => "listings-1525371175", "_primary_term" => 1, "_seq_no" => 42, "_shards" => %{"failed" => 0, "successful" => 1, "total" => 2}, "_type" => "_doc", "_version" => 14, "result" => "not_found"}, nil)
lib/elasticsearch/exception.ex:22: Elasticsearch.Exception.exception/1
lib/elasticsearch.ex:389: Elasticsearch.format/1
(re) lib/re_web/search/server.ex:57: ReWeb.Search.Server.handle_cast/2
(stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:686: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Seems like Elasticsearech.Exception.build/2
is not expecting "result" => "not_found"
as a result.
Elasticsearch version: 6.2.4
Hi, I am thinking of using this library for my ES integration.
I am going to cluster my indexes as posts-{region}
to scope them and search within those scopes.
ie
posts-americas -> would host and be searchable for Americas region
posts-europe -> similary just for europe posts
and then I could delete the whole index to get rid of sort of a dynamic region.
Looking at the documentation I see this
# You should configure each index which you maintain in Elasticsearch here.
# This configuration will be read by the `mix elasticsearch.build` task,
# described below.
indexes: %{
# This is the base name of the Elasticsearch index. Each index will be
# built with a timestamp included in the name, like "posts-5902341238".
# It will then be aliased to "posts" for easy querying.
posts: %{
First: is this possible from the usage of this library or would I have to modify the task itself?
Second: Would the following return an %MyApp.Post{}
struct or a raw ES response with the meta-information about the results too?
Elasticsearch.post(MyApp.ElasticsearchCluster, "/posts-americas/_doc/_search", '{"query": {"match_all": {}}}')
Thank you in advance for the library!
config :recommender, Recommender.Elasticsearch.Cluster,
# When indexing data using the `mix elasticsearch.build` task,
# control the data ingestion rate by raising or lowering the number
# of items to send in each bulk request.
bulk_page_size: 5000,
# Likewise, wait a given period between posting pages to give
# Elasticsearch time to catch up.
# 15 seconds
bulk_wait_interval: 15_000,
Something like this:
mix elasticsearch.build users --cluster Recommender.Elasticsearch.Cluster --wait-interval 15000 --page-size 5000
On top of that. Allowing it to be configured per index level would also be nice.
When bulk loading a large number of documents, in can be tricky to get the bulk_page_size
, bulk_wait_interval
correct, without significantly slowing down an index build.
Also, in some cases the HTTP request takes longer than the default of 8000ms resulting in a timeout.
Digging into the code, there is a default_opts:
configuration option to pass options to HTTPoison.request
, but this doesn't seem to be documented. Setting this to [recv_timeout: 20000]
(or something large) and setting bulk_wait_interval
to zero gives much better performance, since it's waiting at most 20 seconds but only as long as the request needs. For me at least this seemed to eliminate the need to wait a fixed time between requests.
It would be good to at least document the default_opts:
option, or maybe even make configuration of the HTTP request timeout a first-class configuration option.
Could you add a full working Phoenix example with small Ecto model having some relations? I can't figure out how to use this library.
%{"error" => %{"code" => 404, "message" => "elastic: Error 404 (Not Found)", "status" => "Not Found"}}
So I'm having this error when trying to delete a not existing document on an index using delete_document
. I am expecting this function to return an error tuple: {:error, %Elasticsearch.Exception{...}
but It is not doing it on production (locally and on staging env it's working).
As a workaround, I am thinking I should check if a document is existing first before doing anything, does this package have a way to check that? like a get_document
function? Thanks!
And also how do you get the raw data of an index?
Thanks for this library. I'm evaluating elasticsearch-elixir and elastix for a production app and was wondering if you have any thoughts, however brief, on how these libraries might compare? Might even be worth adding to README at some point.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.