Git Product home page Git Product logo

soulmate's Introduction

Soulmate is no longer being actively maintained. For a similar project that is still actively developed, check out soulheart

Soulmate

Soulmate is a tool to help solve the common problem of developing a fast autocomplete feature. It uses Redis's sorted sets to build an index of partially completed words and the corresponding top matching items, and provides a simple sinatra app to query them. Soulmate finishes your sentences.

Soulmate was designed to be simple and fast, and offers the following:

  • Provide suggestions for multiple types of items in a single query (at SeatGeek we're autocompleting for performers, events, and venues)
  • Results are ordered by a user-specified score
  • Arbitrary metadata for each item (at SeatGeek we're storing both a url and a subtitle)

An item is a simple JSON object that looks like:

{
  "id": 3,
  "term": "Citi Field",
  "score": 81,
  "data": {
    "url": "/citi-field-tickets/",
    "subtitle": "Flushing, NY"
  }
}

Where id is a unique identifier (within the specific type), term is the phrase you wish to provide completions for, score is a user-specified ranking metric (redis will order things lexicographically for items with the same score), and data is an optional container for metadata you'd like to return when this item is matched (at SeatGeek we're including a url for the item as well as a subtitle for when we present it in an autocomplete dropdown).

See Soulmate in action at SeatGeek.

Getting Started

As always, kick things off with a gem install:

gem install soulmate

Loading Items

You can load data into Soulmate by piping items in the JSON lines format into soulmate load TYPE.

Here's a sample venues.json (one JSON item per line):

{"id":1,"term":"Dodger Stadium","score":85,"data":{"url":"\/dodger-stadium-tickets\/","subtitle":"Los Angeles, CA"}}
{"id":28,"term":"Angel Stadium","score":85,"data":{"url":"\/angel-stadium-tickets\/","subtitle":"Anaheim, CA"}}
{"id":30,"term":"Chase Field ","score":85,"data":{"url":"\/chase-field-tickets\/","subtitle":"Phoenix, AZ"}}
{"id":29,"term":"Sun Life Stadium","score":84,"data":{"url":"\/sun-life-stadium-tickets\/","subtitle":"Miami, FL"}}
{"id":2,"term":"Turner Field","score":83,"data":{"url":"\/turner-field-tickets\/","subtitle":"Atlanta, GA"}}

And here's the load command (Soulmate assumes redis is running locally on the default port, or you can specify a redis connection string with the --redis argument):

$ soulmate load venue --redis=redis://localhost:6379/0 < venues.json

You can also provide an array of strings under the aliases key that will also be added to the index for this item.

Querying for Data

Once it's loaded, we can query this data by starting soulmate-web:

$ soulmate-web --foreground --no-launch --redis=redis://localhost:6379/0

And viewing the service in your browser: http://localhost:5678/search?types[]=venue&term=stad. You should see something like:

{
  "term": "stad",
  "results": {
    "venue": [
      {
        "id": 28,
        "term": "Angel Stadium",
        "score": 85,
        "data": {
          "url": "/angel-stadium-tickets/",
          "subtitle": "Anaheim, CA"
        }
      },
      {
        "id": 1,
        "term": "Dodger Stadium",
        "score": 85,
        "data": {
          "url": "/dodger-stadium-tickets/",
          "subtitle": "Los Angeles, CA"
        }
      },
      {
        "id": 29,
        "term": "Sun Life Stadium",
        "score": 84,
        "data": {
          "url": "/sun-life-stadium-tickets/",
          "subtitle": "Miami, FL"
        }
      }
    ]
  }
}

The /search method supports multiple types as well as an optional limit. For example: http://localhost:5678/search?types[]=event&types[]=venue&types[]=performer&limit=3&term=yank. You can also add the callback parameter to enable JSONP output.

Mounting soulmate into a rails app

If you are integrating Soulmate into a rails app, an alternative to launching a separate 'soulmate-web' server is to mount the sinatra app inside of rails.

Add this to routes.rb:

mount Soulmate::Server, :at => "/sm"

Add this to gemfile:

gem 'rack-contrib'
gem 'soulmate', :require => 'soulmate/server'

Then you can query soulmate at the /sm url, for example: http://localhost:3000/sm/search?types[]=venues&limit=6&term=kitten

You can also config your redis instance:

# config/initializers/soulmate.rb

Soulmate.redis = 'redis://127.0.0.1:6379/0'
# or you can asign an existing instance of Redis, Redis::Namespace, etc.
# Soulmate.redis = $redis

Rendering an autocompleter

Soulmate doesn't include any client-side code necessary to render an autocompleter, but Mitch Crowe put together a pretty cool looking jquery plugin designed for exactly that: soulmate.js.

Contributing to soulmate

  • Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
  • Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
  • Fork the project
  • Start a feature/bugfix branch
  • Commit and push until you are happy with your contribution
  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright (c) 2011 Eric Waller. See LICENSE.txt for further details.

soulmate's People

Contributors

erwaller avatar hungyuhei avatar jgadbois avatar josegonzalez avatar rixth avatar theghostwhoforks avatar willcosgrove avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soulmate's Issues

Querying soulmate-web through separate app?

Hi, we currently have Soulmate setup in our rails app and we are starting to hit high enough traffic where we want to separate soulmate out into its own process (prevent queries from being blocked behind unicorn)

Starting soulmate-web is simple enough but I cannot figure out how to query the separate Sinatra server from within our Rails app. Is this currently possible?

I have removed the `require: 'soulmate/server'`` dependency from the Gemfile but the Soulmate::Server middleware seems to still be present. Is there a way of configuring an external soulmate-web server rather than mounting it within the app?

Rake aborted. LoadError: cannot load such file -- rack/showexceptions

After running bundle install with gems
gem 'rack-contrib'
gem 'soulmate', :require => 'soulmate/server'
in the gem file of my rails app (5.0.1) I keep getting this

LoadError: cannot load such file -- rack/showexceptions
/var/lib/gems/2.3.0/gems/sinatra-1.0/lib/sinatra/showexceptions.rb:1:in require' /var/lib/gems/2.3.0/gems/sinatra-1.0/lib/sinatra/showexceptions.rb:1:in <top (required)>'
/var/lib/gems/2.3.0/gems/sinatra-1.0/lib/sinatra/base.rb:6:in require' /var/lib/gems/2.3.0/gems/sinatra-1.0/lib/sinatra/base.rb:6:in <top (required)>'
/var/lib/gems/2.3.0/gems/soulmate-1.1.0/lib/soulmate/server.rb:1:in require' /var/lib/gems/2.3.0/gems/soulmate-1.1.0/lib/soulmate/server.rb:1:in <top (required)>'
/home/shivraj/Projects/fast-autocomplete/config/application.rb:7:in <top (required)>' /home/shivraj/Projects/fast-autocomplete/Rakefile:4:in require_relative'
/home/shivraj/Projects/fast-autocomplete/Rakefile:4:in <top (required)>' /var/lib/gems/2.3.0/gems/rake-12.0.0/exe/rake:27:in <top (required)>'
(See full trace by running task with --trace)
Error.
after running rake db:migrate.
Could anybody please tell me what the issue is about?

Broken

This is broken... using autocomplete gets 400 error:
Failed to load resource: the server responded with a status of 400 (HTTP/2.0 400)

{"message":"missing required parameter: q","meta":{"status":400},"status":400}

Probably the word 'term' needs to be changed to 'q'

use multi_json?

After looking at the source for OmniAuth and later looking at the source of this project and noticing the serialization to JSON, I thought maybe multi_json would be a good choice for this project. It enables using yajl-ruby which IMO is in the spirit of redis, as its core is implemented in C, but will still work on JRuby because it falls back to using another library when yajl-ruby isn't found. I also noticed that ripple, the ruby client for riak, uses multijson.

Umlaute not supported

The matcher uses a normalize function which only allows letters from a to z. Hence any Umlaute like ü,ö,ä etc. are ommited.

Thanks for the otherwise great gem,
Paul

Custom Filter Search

Can we use custom filter to do the soulmate search.

For Ex: If we would like to add the city query in our data { city_id: 1 }. can we add it to soulmate?

Please let me know.

Add "drop" command for a type

We regenerate our term data nightly and update the soulmate db. Due to vagaries of our setup, it's hard to keep id constant for a given record from night to night, so "soulmate add TYPE" would lead to duplicate entries. It would be simplest to have a "soulmate drop TYPE" command which allowed us to easily blitz and reload the whole collection. ("soulmate remove TYPE" requires a list of ids).

Thanks.

support scoring shorter length results higher

I have a large database of topics that users are allowed to search (via soulmate). Each topic has reputation on the website so that translates nicely to a soulmate score. However, when a user types a short word often longer words that include this short word will show up first. For example if the user types 'band', there are many topics in the world with the word band included in their name. I'm not sure what the best way to implement this is, but would it be possible to allow weighting of shorter words higher? Such that if someone searches 'band' the topic 'band' will be more likely to show up than 'dave matthew's band'? Maybe take into account the score as well.

I'm open to all ideas. Thanks!

Querying for cyrillic words is case sensitive

Searching for english words is not case sensitive, but searching for russian words is case sensitive, i.e when the search term is "москва" nothing's found, while the search term "Москва" returns the correct result.

Extremely terrible command-line performance

I have a large (8.25 million) JSON file that I want to import. This is what is causing the issue (bin/soulmate:44):

items = $stdin.read.split("\n").map { |l| MultiJson.decode(l) }

Needless to say this is very inefficient and creates a lot of ruby objects.

A better way would be to take a file argument and read the lines sequentially. I'll try to submit a pull request if I have time.

The exact match is not the first item of the results

When I have the following terms in Soulmate:
["Citrix", "Citrix SaaS", "Citrix UK User Group", "Citrix User Group Norway", "Citrix VDI-In-A-Box (Kaviza) User Group", "Dutch Citrix User Group", "Freelancer Citrix Certified Trainer/ Consultant", "Remote Citrix Experts", "SBC Solutions your Citrix solution", "Xenergix - Expertos en Citrix"]

When I search Soulmate using the term "Citrix", I get these terms:
["Citrix SaaS", "Freelancer Citrix Certified Trainer/ Consultant", "Remote Citrix Experts", "Citrix", "SBC Solutions your Citrix solution", "Citrix VDI-In-A-Box (Kaviza) User Group", "Dutch Citrix User Group", "Citrix User Group Norway", "Citrix UK User Group", "Xenergix - Expertos en Citrix"]

Why isn't the exact match the first in the result?

soulmate: command not found after running gem install soulmate

I'm on Rails 3.2.13 and tried installing soulmate and it doesn't work. It says it is installed but the soulmate command doesn't work. It also looks like putting
gem 'rack-contrib'
gem 'soulmate', :require => 'soulmate/server'
in my gemfile and bundling doesn't seem to add them to my app.

Incremental updates?

I'm trying to run Soulmate directly via the Soulmate module. How do I do incremental updates of my index? Soulmate::Loader.new seems to blow away the existing index every time it's called.

(Also, will Soulmate work if used directly like this?)

Cache utility?

I'm doing some stress tests of this gem to evaluate it for usage in my application. I don't know redis very well, so I don't fully understand code in Matcher#matches_for_term. I see that it accepts an option named cache with true as default value, and I'm benchmarking it's behavior.

I've integrated soulmate into a Contact class which produces the following hash for any given contact:

[21] pry(main)> Contact.completions 'ab'
=> [{"id"=>2085, "term"=>"Ms. Abe King", "score"=>9988},
 {"id"=>1762, "term"=>"Mr. Abner Bode", "score"=>9986},
 {"id"=>495, "term"=>"Ms. Hank Abbott", "score"=>9985},
 {"id"=>927, "term"=>"Mrs. Abbey Stehr", "score"=>9984},
 {"id"=>893, "term"=>"Mr. Tobin Abbott", "score"=>9984},
 {"id"=>787, "term"=>"Mrs. Otis Abbott", "score"=>9984},
 {"id"=>438, "term"=>"Mr. Abagail Kihn", "score"=>9984},
 {"id"=>199, "term"=>"Mr. Liana Abshire", "score"=>9983},
 {"id"=>1807, "term"=>"Dr. Frida Abshire", "score"=>9983},
 {"id"=>960, "term"=>"Miss Lucile Abbott", "score"=>9982}]

#completions is a simple wrapper for Soulmate::Matcher.new('contact').matches_for_term, and in my database there are about 2200 records.

Now I'm running the following benchmark:

# Initial implementation
def random_string(l)
  ('a'..'z').to_a.shuffle[0,l].join
end

# Already populated redis instance
matcher = Soulmate::Matcher.new('contact')
# Match Procs
cache = proc { |l| matcher.matches_for_term random_string(l), cache: true }
no_cache = proc { |l| matcher.matches_for_term random_string(l), cache: false }

Benchmark.bmbm do |x| 
  x.report('With cache on (l = 2)') { 1000.times { cache.call(2) } }
  x.report('With cache off (l = 2)') { 1000.times { no_cache.call(2) } }
  x.report('With cache on (l = 5)') { 1000.times { cache.call(5) } }
  x.report('With cache off (l = 5)') { 1000.times { no_cache.call(5) } }
end

And I'm getting these results

# with 2118 contacts

Rehearsal ----------------------------------------------------------
With cache on (l = 2)    2.630000   0.080000   2.710000 (  2.723404)
With cache off (l = 2)   2.180000   0.060000   2.240000 (  2.236404)
With cache on (l = 5)    2.600000   0.060000   2.660000 (  2.657957)
With cache off (l = 5)   1.920000   0.050000   1.970000 (  1.966270)
------------------------------------------------- total: 9.580000sec

                             user     system      total        real
With cache on (l = 2)    2.530000   0.050000   2.580000 (  2.589633)
With cache off (l = 2)   2.100000   0.050000   2.150000 (  2.148585)
With cache on (l = 5)    2.500000   0.070000   2.570000 (  2.555852)
With cache off (l = 5)   1.810000   0.040000   1.850000 (  1.858565)

With cache on the completion engine is slower than with cache turned off. Since my benchmarks use a random string and most of them don't produce results I tried a different implementation of random_string method to ensure completion results. The new implementation is the following:

# Generate a set of exisiting prefixes
def prefix_set(l)
  @prefixes_set ||= {}
  @prefixes_set[l] ||= begin
    Set.new.tap do |set|
      Contact.all.map(&:full_name).each do |name|
        # Build an entry for each prefix found in full names
        set.merge(name.split(/\W+/).reject {|s| s.size < 2 }.map { |w| w.first(l) })
      end
    end
  end
end

# Generate prefixes of size 2 from existing data
def existing_prefixes(l)
  @prefixes ||= {}
  @prefixes[l] ||= prefix_set(l).to_a
end

def random_string(l)
  existing_prefixes(l).sample
end

# warmup exisiting prefixes and remove prefixes without completions
existing_prefixes(2).reject! { |p| Contact.completions(p).empty?  }
existing_prefixes(5).reject! { |p| Contact.completions(p).empty?  }

But even with this approach performances are directly comparable:

Rehearsal ----------------------------------------------------------
With cache on (l = 2)    2.870000   0.080000   2.950000 (  2.947495)
With cache off (l = 2)   2.900000   0.070000   2.970000 (  2.986641)
With cache on (l = 5)    2.800000   0.060000   2.860000 (  2.849990)
With cache off (l = 5)   2.800000   0.060000   2.860000 (  2.865739)
------------------------------------------------ total: 11.640000sec

                             user     system      total        real
With cache on (l = 2)    2.790000   0.060000   2.850000 (  2.861272)
With cache off (l = 2)   2.810000   0.060000   2.870000 (  2.884098)
With cache on (l = 5)    2.750000   0.070000   2.820000 (  2.811039)
With cache off (l = 5)   2.730000   0.060000   2.790000 (  2.795882)

Is it worth to keep the cache functionality into the match method?

Update hosted Gem to 0.0.4 w/ support for JSONP

I spent the last hour or so trying to figure out why I couldn't send a cross-domain request to a Soulmate sever, even though the docs + commits said JSONP was added in May.

I finally realized that the 0.0.3 gem didn't include the commits from 2 months ago and got it working once I manually built the gem from the github repo.

Can the hosted gem be updated to 0.0.4 to include the latest commits?

It will likely save other dev's the trouble I went through.

Broken gemspec

I've noticed a couple of issues with the gemspec. The first is that soulmate depends on itself. That means that presently I cannot install soulmate using gem install soulmate. Also, jeweler is adding a lot of duplicate dependencies (in fact it adds them every time you run rake gemspec:generate.

I'm not sure what's gone on with the gemspec, but there's obviously some broken stuff there.

Use ZRANGEBYLEX to improve memory usage

Redis 2.8.9 introduced a new command called ZRANGEBYLEX that soulmate could use internally to improve performances and reduce memory usage significantly.

An example of implementation of autocomplete using the new command is here: http://autocomplete.redis.org/

Note: sometimes the example is down because I've not enough memory in the virtual machine running the example.

Question: Is it possible to mount soulmate in a rails 3 app?

If so, how would one do it? I'd like to get the soulmate search at a url like /sm/search...

I've tried:

mount Soulmate::Server, :at => "/sm"

In my routes file with no luck. I also have this in an initializer:

require 'soulmate/server'

Thanks!

Seems like version should be updated

Gem file pointing to v1.0.0 is slightly different from master branch which causes failures when assigning existing instance of Redis on initialization.

UTF-8 issue

The autocomplete doesn't work with UTF-8.
For example, if I add the value "בדיקה", the autocomplete just won't work.

How can I solve this problem?

Thanks

Loader::remove can cause orphaned keys

Add some items to the index:

$ echo '{"id":1, "term":"bun"}' | soulmate add test
Adding items of type test...
Loaded a total of 1 items
$ echo '{"id":2, "term":"buy"}' | soulmate add test
Adding items of type test...
Loaded a total of 1 items

redis> KEYS *
1) "soulmate-data:test"
2) "soulmate-index:test:buy"
3) "soulmate-index:test:bun"
4) "soulmate-index:test"
5) "soulmate-index:test:bu"

Remove one of those items:

$ echo '{"id":2, "term":"buy"}' | soulmate remove test
Removing items of type test...
Removed a total of 1 items

redis> KEYS *
1) "soulmate-data:test"
2) "soulmate-index:test:bun"
3) "soulmate-index:test"
4) "soulmate-index:test:bu"

Recreate the collection:

$ echo '{"id":2, "term":"eat"}' | soulmate load test
Loading items of type test...
Loaded a total of 1 items

redis> KEYS *
1) "soulmate-data:test"
2) "soulmate-index:test:eat"
3) "soulmate-index:test:ea"
4) "soulmate-index:test"
5) "soulmate-index:test:bu"

The key soulmate-index:test:bu was orphaned by the removal of buy.

Is there a plan to support Rails 5?

Having massive trouble demangling all the rack, etc. to get this working with Rails 5. Anyone have a baranch/fork that will offer this support?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.