Git Product home page Git Product logo

sequel-elasticsearch's Introduction

Sequel::Elasticsearch

Sequel::Elasticsearch allows you to transparently mirror your database, or specific tables, to Elasticsearch. It's especially useful if you want the power of search through Elasticsearch, but keep the sanity and structure of a relational database.

Build Status Maintainability Test Coverage

Installation

Add this line to your application's Gemfile:

gem 'sequel-elasticsearch'

And then execute:

$ bundle

Or install it yourself as:

$ gem install sequel-elasticsearch

Usage

Require the gem with:

require 'sequel/plugins/elasticsearch'

You'll need an Elasticsearch cluster to sync your data to. By default the gem will try to connect to http://localhost:9200. Set the ELASTICSEARCH_URL ENV variable to the URL of your cluster.

This is a Sequel plugin, so you can enable it DB wide:

Sequel::Model.plugin :elasticsearch

Or per model:

Document.plugin Sequel::Elasticsearch

# or

class Document < Sequel::Model
  plugin :elasticsearch
end

There's a couple of options you can set:

Sequel::Model.plugin :elasticsearch,
  elasticsearch: { log: true }, # Options to pass the the Elasticsearch ruby client
  index: 'all-my-data', # The index in which the data should be stored. Defaults to the table name associated with the model
  type: 'is-mine' # The type in which the data should be stored.

And that's it! Just transact as you normally would, and your records will be created and updated in the Elasticsearch cluster.

Indexing

Ensure that you create the index mappings for your data before using this plugin, otherwise you might get some weird results.

The records will by default be indexed using the values call of the model. Should you need to customize what's indexed, you can define a indexed_values method (or as_indexed_json method if you prefer the Rails way).

Searching

Your model is now searchable through Elasticsearch. Just pass down a string that's parsable as a query string query.

Document.es('title:Sequel')
Document.es('title:Sequel AND body:Elasticsearch')

The result from the es method is an enumerable containing Sequel::Model instances of your model:

results = Document.es('title:Sequel')
results.each { |e| p e }
# Outputs
# #<Document @values={:id=>1, :title=>"Sequel", :body=>"Document 1"}>
# #<Document @values={:id=>2, :title=>"Sequel", :body=>"Document 2"}>

The result also contains the meta info about the Elasticsearch query result:

results = Document.es('title:Sequel')
p results.count # The number of documents included in this result
p results.total # The total number of documents in the index that matches the search
p results.timed_out # If the search timed out or not
p results.took # How long, in miliseconds the search took

You can also use the scroll API to search and fetch large datasets:

# Get a dataset that will stay consistent for 5 minutes and extend that time with 1 minute on every iteration
scroll = Document.es('test', scroll: '5m')
p scroll_id # Outputs the scroll_id for this specific scrolling snapshot
puts "Found #{scroll.count} of #{scroll.total} documents"
scroll.each { |e| p e }
while (scroll = Document.es(scroll, scroll: '1m')) && scroll.empty? == false do
  puts "Found #{scroll.count} of #{scroll.total} documents"
  scroll.each { |e| p e }
end

Import

You can import the whole dataset, or specify a dataset to be imported. This will create a new, timestamped index for your dataset, and import all the records from that dataset into the index. An alias will be created (or updated) to point to the newly created index.

Document.import! # Import all the Document records. Use the default settings.

Document.import!(dataset: Document.where(active: true)) # Import all the active Document records

Document.import!(
    index: 'active-documents', # Use the active-documents index
    dataset: Document.where(active: true), # Only index active documents
    batch_size: 20 # Send documents to Elasticsearch in batches of 20 records
)

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/jrgns/sequel-elasticsearch.

Features that needs to be built:

  • An es method to search through the data on the cluster.
  • Let es return an enumerator of Sequel::Model instances.
  • A rake task to create or suggest mappings for a table.

License

The gem is available as open source under the terms of the MIT License.

sequel-elasticsearch's People

Contributors

jrgns avatar ttilberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

sequel-elasticsearch's Issues

[types removal] Specifying types in bulk requests is deprecated.

When I try to bulk import via:

Mailbox::Email.import! dataset: Mailbox::Email.where { ...conditions }

It does the job but I get following warning. Elasticsearch seems deprecated this feature and probably will remove on upcoming versions.

warning: 299 Elasticsearch-7.10.0-51e9d6f22758d0374a0f3f5c6e8f3a7997850f96 "[types removal] Specifying types in bulk requests is deprecated."

What is/are our option(s) in here?

Usage query

Hey, if i'm using the plugin in this way:

class Document < Sequel::Model
  plugin :elasticsearch
end

How could I specify the config for elasticsearch? such like username/password?
I saw you wrote

Sequel::Model.plugin :elasticsearch,
  elasticsearch: { log: true }, # Options to pass the the Elasticsearch ruby client
  index: 'all-my-data', # The index in which the data should be stored. Defaults to the table name associated with the model
  type: 'is-mine' # The type in which the data should be stored.

but where should I put this?

batch update on Elastic based on ids

I update models in batch like this:

Email.where(user_id: user.id, guid: guids).update(...)

So, hooks are not called. Is there a way to invoke and update affected ids/guids (which is available on es)? For example;

Email.update_es!(dataset: Email.where(guid: guids))

This will force update all emails based on dataset

Timestamp columns aren't seen as timestamps

I've found that when I load records from Sequel into ElasticSearch, I always have to remap the DateTime columns with #iso_8601 for it to get picked up as a timestamp in ES.

I use SQL Server and Sequel/TinyTds and I'm not sure if folks on Postgres and others experience the same issues.

For example, when running a_model_object.db_schema.map{|col, v| [col, v[:db_type] ]}, I can see each of the data types, and I know that datetime columns need special attention in ES before I send the data.

While learning about ES, I've been loading my data using the bulk method with methods that look like this:

def format_record values
  values[:started_at] = values[:started_at] && values[:started_at].iso8601
  values[:completed_at] = values[:completed_at] && values[:completed_at].iso8601
  [
    { index: { _index: 'jobs', _type: 'job', _id: values[:id] } },
    values
  ]
end

In my research, it seems to be an issue with the value formatted from Ruby's Time.new.to_s

The above example could obviously be refactored to be a bit more dynamic, perhaps something along the lines of values.each{|k,v| values[k] = v.iso_8601 if v.is_a? Time}

Trying to force the mapping explicitly in elasticsearch as date yields messages like:

"Invalid format: \"2018-02-06 04:38:47 -0600\" is malformed at \" 04:38:47 -0600\""

Essentially, as far as I can tell, all DateTime columns / ruby Time objects should be converted to #iso_8601 before being loaded for es to see it correctly. But I'm very new to ES, so I may be missing something.

Need flexible way to modify/extend indexed values list

Hello,

I'm working on a parity to ElasticSearch-Rails, which adds an overridable method to ActiveRecord::Model::InstanceVariables for the indexed values (#as_indexed_json).

So, for Sequel::Elasticsearch, I'm thinking of adding InstanceMethods#indexed_values to lib/sequel/plugins/elasticsearch.rb with default implementation simply return Sequel::Model#values.
Then modify #index_document to use #indexed_values.

I'll be working on this sometime in the next couple of weeks. Just to let you know this issue is on my radar. I"ve forked the project for my dev.

Kindest Regards,
Peter Fitzgibbons

index-name parity with Elasticsearch::Model gem

Hello,
I am writing a worker-component with sequel-elasticsearch whose index must also be read by an equivalent ActiveRecord model with Elasticsearch::Model plugin.
I am not seeing the same results between the two.

I am seeing this for ElasticSearch::Model index name:

# class Forum < ActiveRecord::Base
#       include Elasticsearch::Model
#      index_name "throtl-admin-forum-#{Rails.env}"
Forum.__elasticsearch__.index_name
=> "throtl-admin-forum-development"

And this for sequel-elasticsearch index name:

#    class Forum < Sequel::Model
#      env = ENV['RAILS_ENV'] || 'development'
#      plugin :elasticsearch,
#             index: "throtl-admin-forum-#{env}".to_sym
Scraper::DB::Forum.elasticsearch_index
 => :"throtl-admin-forum-development"

So:
"throtl-admin-forum-development" # Elasticsearch::Model
vs
:"throtl-admin-forum-development" # sequel-elasticsearch

I think the issue is that the sequel-elasticsearch index name somehow has the quotes captured in the symbolized index name? I even tried to force-symbolize the key in my code above, without affect.

Could you tell me if you have any ideas how this can be done correctly?

Thanks and Kindest Regards,
Peter Fitzgibbons

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.