Git Product home page Git Product logo

ex_elasticlunr's Introduction

Elasticlunr

Dialyzer Test Codecov

Elasticlunr is a small, full-text search library for use in the Elixir environment. It indexes JSON documents and provides a friendly search interface to retrieve documents.

Why

The library is built for web applications that do not require the deployment complexities of popular search engines while taking advantage of the Beam capabilities.

Imagine how much is gained when the search functionality of your application resides in the same environment (Beam VM) as your business logic; search resolves faster, the number of services (Elasticsearch, Solr, and so on) to monitor reduces.

Installation

The library can be installed by adding elasticlunr to your list of dependencies in mix.exs:

def deps do
  [
    {:elasticlunr, "~> 0.6"}
  ]
end

Documentation can be found at hexdocs.pm. See blog post Introduction to Elasticlunr and Livebook for examples.

Features

  1. Query-Time Boosting, you don't need to set up boosting weight in the index building procedure, Query-Time Boosting makes it more flexible so you could try different boosting schemes
  2. More Rational Scoring Mechanism, Elasticlunr uses a similar scoring mechanism as Elasticsearch, and also this scoring mechanism is used by Lucene
  3. Field-Search, you can choose which field to index and which field to search
  4. Boolean Model, you can set which field to search and the boolean model for each query token, such as "OR" and "AND"
  5. Combined Boolean Model, TF/IDF Model, and the Vector Space Model make the results ranking more reliable.

Token Expansion

Sometimes users want to expand a query token to increase RECALL. For example, user query token is "micro", and assume "microwave" and "microscope" are in the index, if the user chooses to expand the query token "micro" to increase RECALL, both "microwave" and "microscope" will be returned and search in the index. The query results from expanded tokens are penalized because they are not the same as the query token.

Livebook

The repository includes a livebook file that you can run. You can click the button below to run it using livebook.dev!

Run in Livebook

Storage

Elasticlunr allows you to write your indexes to whatever storage provider you want. You don't need to acess the Elasticlunr.Storage module directly, it is used by the Elasticlunr.IndexManager. See available providers below:

To configure what provider to use:

config :elasticlunr,
  storage: Elasticlunr.Storage.S3

Note that all indexes in storage are preloaded on application startup. To see the available provider configuration, you should reference it module.

License

Elasticlunr is released under the MIT License - see the LICENSE file.

ex_elasticlunr's People

Contributors

heywhy avatar jdewar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ex_elasticlunr's Issues

[Important] Community Opinion Requested

I created this issue because I believe the project is able to solve your application search needs/requirements.

And I'm here to request your suggestions on what direction the project should take or what features you think is important to help you improve the search functionality or behavior of your application. Let me know what you think in the comments.

Though we have a couple of items that will be focused on in the coming weeks, see below:

  1. Performance: as we all know performance is an important part of everything we build
  2. Distributed System Support: I had a couple of engineers suggest this and I believe it's something nice to have considering that the Beam VM provides the necessary components to achieve.

Please, let me know what your opinions are and what features you will like to see in the comments below.

Warning message on compile

After every compile I get the message at the end of this post. Maybe I'm a little OCD, but I can't stand this kind of messages :P

The way I see it, there are 3 possible solutions:

  1. Document in the installation process that crypto should be added to extra_applications.
  2. Add crypto as a dep of ex_elasticlunr.
  3. Modify the code so that crypto is not required.

The preferred scenario to me would be #3, which begs the question: Where is it being used?

Is it here?

name: Keyword.get_lazy(opts, :name, &UUID.uuid4/0),

warning: :crypto.hash/2 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Invalid call found at 2 locations:
  lib/uuid.ex:498: UUID.namebased_uuid/2
  lib/uuid.ex:502: UUID.namebased_uuid/2

warning: :crypto.strong_rand_bytes/1 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Invalid call found at 3 locations:
  lib/uuid.ex:340: UUID.uuid4/1
  lib/uuid.ex:469: UUID.uuid1_clockseq/0
  lib/uuid.ex:492: UUID.uuid1_node/1

Wildcard search?

I'm not familiar with elasticsearch's searching, but I can't seem to find a way to do wildcard searching with ex_elasticlunr. I'm powering a simple type-ahead documentation search field.

Official Docs Contribution

I will be happy to have anyone contribute to the official docs for the project. Having comprehensive docs at hex.pm is desired, so modules need to be updated.

Failure deserealizing a saved index?

After adding fields, saving, and updating a persisted index, on re-starting via iex -S mix, application crashes with the following error message:

** (Mix) Could not start application elasticlunr: exited in: Elasticlunr.Application.start(:normal, [])
    ** (EXIT) an exception was raised:
        ** (FunctionClauseError) no function clause matching in String.split/3
            (elixir 1.13.4) String.split(["text|{\"nil\":\"~\\n\\n\\n\\noe Do\\n\\n\\n\\n3.3(b)(1)\\n\\n\\n\\nse .. 3.5(c) .\\n\\nete) 3,3(b)(1) .\\n\\nCAICh = ISL (S91 -\\n\\nBasilio Arturo Ignacio LAM zO ARGENTINA +\\n\\n(Pnonetic: LAHmee . . 43\\n\\n\\n\\nApproved for Release: 2018/10/02 C06628363\\n\\n\\n\\nDOHSO) 4\\n\\n\\n\\nCommander in Chief\\n\\n\\n\\nForce; Member, Ruling\\n\\nJunta (since 17 December OFFICE OF\\n\\n1981) CENTRAL REFERENCE\\n\\n\\n\\nAddressed as:\\n\\nGeneral Lami Dozo\\n\\n\\n\\nMaj. Gen. Basilio\\n\\nLami Dozo was secretary\\n\\ngeneral of the Air Force\\n\\nfor over three years and\\n\\nchief of air operations\\n\\nfor one year before as-\\n\\nsuming his present posts.\\n\\nA politician known for\\n\\nhis ability, intelligence, and frankness, he is ex-\\n\\npected to become an,important, spokesman for the rul-\\n\\ning junta, while displaying 4. flexible yet pragmatic\\n\\norientation within", "the group A highly political\\n\\ngeneral, he is comEortable with the give and take of\\n\\npolitics, and he has an impressive network of civil-\\n\\nian contacts. Using his effective, low-key approach, 3.5(C)\\n\\n\\n\\nLami Dozo will probably push for accommodatio\\n\\nthe various political forces in Argentina.\\n\\n\\n\\nLami Dozo is anti-Communist, anti-Peronist, and\\n\\nhighly nationalistic. As an influential member of\\n\\nthe government hierarchy, for the past several years\\n\\nhe has played an active role in negotiations between\\n\\nChile and Argentina over the sovereignty of the\\n\\nBeagle Channel. He has been open and friendly with\\n\\nUS officials in Argentina\\n\\n\\n\\nHe has traveled\\n\\nto this country several times and speaks with fond-\\n\\nness of these trips. ft 3.5(c)\\n\\n\\n\\nA 1950 graduate of the Military Aviation\\n\\nSchool, Lami Dozo subsequently served for 14 years\\n\\nat the Palomar Air Force Base,.in Buenos Aires. In\\n\\n1966 he trained at McGuire Air Force Base on C-130\\n\\naircraft. During 1972-73 he Was stationed in Canada\\n\\nas a delegate to the International Civil Aviation\\n\\nOrganization. Lami Dozo, 52; speaks English and\\n\\nFrench. Married, he has two sons and three daugh-\\n\\n\\n\\nters.\\n\\n3.5(c)\\n\\n\\n\\nCR M 81-15983\\n\\n\\n\\noO \\\"5446030 a |\\n\\n\\n\\nApproved for Release: 2018/10/02 C06628363\\n\\n\\n\"}"], "|", [])
            (elasticlunr 0.6.6) lib/elasticlunr/deserializer.ex:67: Elasticlunr.Deserializer.Parser.parse/3
            (elasticlunr 0.6.6) lib/elasticlunr/deserializer.ex:16: anonymous fn/2 in Elasticlunr.Deserializer.Parser.process/1
            (elixir 1.13.4) lib/enum.ex:4144: anonymous fn/3 in Enum.reduce/3
            (elixir 1.13.4) lib/stream.ex:1559: Stream.do_element_resource/6
            (elixir 1.13.4) lib/enum.ex:4144: Enum.reduce/3
            (elixir 1.13.4) lib/stream.ex:572: anonymous fn/4 in Stream.map/2
            (elixir 1.13.4) lib/enum.ex:4475: Enumerable.List.reduce/3

If necessary, I can upload the source that is doing the data ingestion into the index.

Can't get query-time boosting to work

Hey, I like the promise of this library, but can't barely get anything to work, and the lack of any documentation makes it very hard to figure out how to use it properly. I do not have previous elastic/lucene experience, so it's very difficult trying to trial-and-error queries.

Here's my scenario, I have an index:

pipeline = Pipeline.new(Pipeline.default_runners())

[pipeline: pipeline]
|> Index.new()
|> Index.add_field("name", )
|> Index.add_field("author_names")
|> Index.add_field("author_person_names")
|> Index.add_field("narrator_names")
|> Index.add_field("narrator_person_names")
|> Index.add_field("series_names")

And I want to search for a query string in all the fields but with various weights. Here are two patterns I've tried, neither of them work:

# in this example, "expand" is being honored, but "boost" is not:
Elasticlunr.Index.search(index, %{
  "query" => %{
    "bool" => %{
      "should" => [
        %{"match" => %{"name" => %{"boost" => 1.0, "query" => query, "expand" => true}}},
        %{"match" => %{"author_names" => %{"boost" => 0.5, "query" => query, "expand" => true}}},
        %{"match" => %{"author_person_names" => %{"boost" => 0.25, "query" => query, "expand" => true}}},
        %{"match" => %{"narrator_names" => %{"boost" => 0.5, "query" => query, "expand" => true}}},
        %{"match" => %{"narrator_person_names" => %{"boost" => 0.25, "query" => query, "expand" => true}}},
        %{"match" => %{"series_names" => %{"boost" => 0.5, "query" => query, "expand" => true}}}
      ]
    }
  }
})

# This example seems to be what the library expects based on reading the source code but there are bugs preventing this from working:
Elasticlunr.Index.search(index, query, %{
  "fields" => %{
    "name" => %{"boost" => 1.0},
    "author_names" => %{"boost" => 0.5},
    "author_person_names" => %{"boost" => 0.25},
    "narrator_names" => %{"boost" => 0.5},
    "narrator_person_names" => %{"boost" => 1.0},
    "series_names" => %{"boost" => 0.25},
  }
})

Regarding the second example above, there are some obvious errors in the code. See here: https://github.com/heywhy/ex_elasticlunr/blob/master/lib/elasticlunr/core/index.ex#L203 you are iterating over a map here, but expecting the field string to be given to the callback, when in fact it's a {field_name, value} tuple. It seems there are no high-level tests in the library to catch this error.

Please let me know if this is even something that should be possible, or if I'm barking up the wrong tree.

Livebook issues

When running the last statement under the section Nested document attributes, the following error appears:

search_query = IO.gets("Search users")
Index.search(users_index, search_query)
** (FunctionClauseError) no function clause matching in Elasticlunr.Index.search/3

The following arguments were given to Elasticlunr.Index.search/3:

    # 1
    %Elasticlunr.Index{documents_size: 4, fields: %{"address" => (...) , ref: "id", store_documents: true, store_positions: true}

    # 2
    {:error, :enotsup}

    # 3
    nil

Attempted function clauses (showing 6 out of 6):

    def search(%Elasticlunr.Index{}, -nil-, _opts)
    def search(%Elasticlunr.Index{ref: ref} = index, query, nil) when -is_binary(query)-
    def search(%Elasticlunr.Index{ref: ref} = index, query, -%{"fields" => fields}-) when -is_binary(query)-
    def search(%Elasticlunr.Index{} = index, -%{"query" => _} = query-, _opts)
    def search(%Elasticlunr.Index{} = index, query, nil) when -is_map(query)-
    def search(%Elasticlunr.Index{} = index, -%{} = query-, options)

    (elasticlunr 0.6.4) lib/elasticlunr/core/index.ex:182: Elasticlunr.Index.search/3

I think this is connect to a warning LiveBook gave me about not using Kino.input.

When I modify the code like below, everything works fine:

search_query = "find something"
Index.search(users_index, search_query)

Software Versions:
Erlang/OTP 24 [erts-12.1.5] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1] [jit] [dtrace]
Elixir 1.13.0 (compiled with Erlang/OTP 24)
Livebook 0.4.1

Search fails in livebook instance

There is inconsistency when a search for the word "me" is executed in livebook, but every other word seems to pass without any problem. Running the same notebook as a script on my machine fails even though the IO.get (I was wrong thinking it's the issue) statement was removed. The most interesting part is that when I run the same code in the test environment it works.

See images below:

Screen Shot 2021-11-06 at 5 20 43 PM

Screen Shot 2021-11-06 at 5 21 02 PM

Screen Shot 2021-11-06 at 5 22 00 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.