Git Product home page Git Product logo

picky's People

Contributors

albandiguer avatar andi avatar andykitchen avatar beatrichartz avatar danfarino avatar dbussink avatar djpowers avatar floere avatar imageoptimiser avatar joho avatar kschiess avatar maciejczyzewski avatar overbryd avatar prami avatar reactormonk avatar richo avatar rogerbraun avatar stanley avatar tilsammans avatar tonini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

picky's Issues

Add dynamic or static hint to index definitions

When the index is static, it is advisable to use symbols internally.

However, when also removing entries from the index using index.remove(id), it is better to use Strings, as Symbols never get freed.

Consider adding a hint to the index definition as follows:

Picky::Index.new(:dynamic) do
  static
end

or

Picky::Index.new(:dynamic) do
  unchanging
end

or

Picky::Index.new(:dynamic) do
  static true
end

Also think about what the default should be. I tend towards dynamic indexes. And let the user add a helpful hint to optimize in case it's static.

It's theoretically impossible to determine at runtime what to do, so we need the user input.

Activesupport dependency

Hi,
if I understand correctly activesupport is a picky-client dependency and for this reason it should be specified in picky-clinet.gemspec file. Now because there is a "rescue LoadError" clause (https://github.com/floere/picky/blob/master/client/lib/picky-client/client.rb#L123, which I don't understand) it can cause misleading "undedfined method to_query" error if somebody uses bundler for example.
Also there is no need to define Hash#to_query method since it is required by required 'active_support/core_ext/object/to_query' (https://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/object/to_query.rb#L1).

Please take a look at my commit: Stanley@d10a182 and see if it makes any sense.

Best,
Stan

Add Postfix, Prefix to partial options.

Currently, Infix and Substring are available. Infix provides true Infix capabilities, while substring behaves as (similar as) most substring implementations behave.

Postfix and Prefix would be special implementations of Substring:

Postfix: Substring with fixed to: -1
Prefix: Substring with fixed from: 1, and reverse "partializing".

Note: Prefix is delayed until somebody actually needs it.

More similarity/partial possibilities, better docs for them

Currently we have for similarity:

Similarity::DoubleMetaphone.new(amount_similar)
Similarity::Metaphone.new(amount_similar)
Similarity::Soundex.new(amount_similar)

And for partial searches:

Partial::Substring.new(from: index_pos1, to: index_pos2)

Do we need more, what do you think? Do you have a need for more or even written one?

Add back/forward history to Picky Javascript

Add a browser back/forward feature to Picky's javascript client such that:

  • although it needs a jquery lib, it will still work normally (without back/forward) if this lib isn't available.
  • it will remember full (return pressed, with results) queries in the browser history
  • on back it will insert the last search query into the Picky search and execute
  • on forward it will insert the next search query into the Picky search and execute

Remove #check, #backup, #restore methods.

Throughout index/category/bundle/backend.

These are convenience methods that can be included again when someone needs them. At the moment, they represent maintenance effort for nothing.

Realtime Benchmarks. Pronto.

People love numbers. Without numbers to quote, they feel like garbage.

Let's give them numbers. Many many numbers. Grouped in groups. Compared to others. Multiplied, added, subtracted.

"We need numbers. Lots of numbers." -- Neo, in "The Actual Matrix"

"only_use" / "ignore" categories option in Searches

Create an option include/exclude:

Search.new(books) do
  ignore :isbn
end

In the first example, any token being matched to category :isbn would be ignored.

Search.new(advertisements) do
  only_use :city, :zipcode
  # perhaps call ignore_unless
end

In the second example, ONLY tokens being matched to category :city or category :zipcode would survive. Tokens matching e.g. category :name would be ignored.

Why is this necessary?
Sometimes users don't want all categories used in their searches, or want a broader search on certain indexes where some tokens are just ignored.

One example is an advertisement search that is coupled to an address search. Tokens matching the name are ignored, while Tokens matching a zipcode are used.

What do you think?

MOAR performance tests

Use a matrix of:

  • little data, medium amount of data, large amount of data

and various backends

  • memory, redis, sqlite, file.

and queries that are

  • easy, medium, hard, combinatorial nightmares

Then automatically run 48 tests, one after another, to see how each of the backends performs.

Most raketasks don't work on standard install

After installing Picky 3.0, generating a classic_server and indexing, most rake tasks (analyze, for example) error out.

Example:

roger@roger-MS-7621:~/temp/test$ bundle exec rake analyze
Loaded picky with environment 'development' in /home/roger/temp/test on Ruby 1.9.2.
Application BookSearch loaded.
rake aborted!
uninitialized constant Object::Indexes

Tasks: TOP => analyze
(See full trace by running task with --trace)

It seems to miss an "include Picky".

Expand configuration API

Expand the API such that this becomes possible:

  Index::Memory.new(:index_specific_indexing) do
    source   Sources::CSV.new(:title, file: 'data/books.csv')
    indexing removes_characters: /[^äöüd-zD-Z0-9\s\/\-\"\&\.]/i, # a-c, A-C are removed
             splits_text_on:     /[\s\/\-\"\&\/]/
    category :title,
             qualifiers: [:t, :title, :titre],
             partial:    Partial::Substring.new(from: 1),
             similarity: Similarity::DoubleMetaphone.new(2)
  end

Also, rename

default_indexing
default_querying

to

indexing
searching

.

Dup ids or be non-destructive on attributes?

The case:

  context 'fun cases' do
    it 'stopwords destroy ids (final finding: id referenced also on attribute)' do
      index = Picky::Index.new :stopwords do
        key_format :to_sym
        indexing stopwords: /and/
        category :name
      end

      referenced = "this and that"

      require 'ostruct'

      thing = OpenStruct.new id: referenced, name: referenced

      index.add thing

      try = Picky::Search.new index

      try.search("this").ids.should == ["this and that"] # Fails. It's ["this  that"].
    end
  end

Finalize JS, provide to users

I still need to finalize the JavaScript API and internals.

Provide them to the users?
picky-client install javascripts?
-> copy to public/javascripts by default? Or ask if the dir is not there?

Gemspec is invalid

I am running 1.9.2 with RVM, RubyGems is updated to 1.8.11 and I get the errors below when installing Picky. It seems to be a problem with Syck requiring "=" to be escaped, see igrigorik/em-websocket#65


WARNING:  #<ArgumentError: Illformed requirement ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"]>
# -*- encoding: utf-8 -*-

Gem::Specification.new do |s|
  s.name = "picky"
  s.version = "3.3.2"

  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
  s.authors = ["Florian Hanke"]
  s.date = "2011-11-02"
  s.description = "Fast Ruby semantic text search engine with comfortable single field interface."
  s.email = "[email protected]"
  s.executables = ["picky"]
  s.extensions = ["lib/picky/ext/ruby19/extconf.rb"]
  s.files = ["bin/picky", "lib/picky/ext/ruby19/extconf.rb"]
  s.homepage = "http://florianhanke.com/picky"
  s.require_paths = ["lib"]
  s.rubyforge_project = "http://rubyforge.org/projects/picky"
  s.rubygems_version = "1.8.11"
  s.summary = "Picky: Semantic Search Engine. Clever Interface. Good Tools."

  if s.respond_to? :specification_version then
    s.specification_version = 3

    if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
      s.add_development_dependency(%q<rspec>, [">= 0"])
      s.add_development_dependency(%q<picky-client>, ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"])
      s.add_runtime_dependency(%q<rack>, [">= 0"])
      s.add_runtime_dependency(%q<rack_fast_escape>, [">= 0"])
      s.add_runtime_dependency(%q<text>, [">= 0"])
      s.add_runtime_dependency(%q<yajl-ruby>, [">= 0"])
      s.add_runtime_dependency(%q<activesupport>, ["~> 3.0"])
      s.add_runtime_dependency(%q<activerecord>, ["~> 3.0"])
      s.add_runtime_dependency(%q<unicorn>, [">= 0"])
      s.add_runtime_dependency(%q<sinatra>, [">= 0"])
      s.add_runtime_dependency(%q<redis>, [">= 0"])
      s.add_runtime_dependency(%q<mysql>, [">= 0"])
    else
      s.add_dependency(%q<rspec>, [">= 0"])
      s.add_dependency(%q<picky-client>, ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"])
      s.add_dependency(%q<rack>, [">= 0"])
      s.add_dependency(%q<rack_fast_escape>, [">= 0"])
      s.add_dependency(%q<text>, [">= 0"])
      s.add_dependency(%q<yajl-ruby>, [">= 0"])
      s.add_dependency(%q<activesupport>, ["~> 3.0"])
      s.add_dependency(%q<activerecord>, ["~> 3.0"])
      s.add_dependency(%q<unicorn>, [">= 0"])
      s.add_dependency(%q<sinatra>, [">= 0"])
      s.add_dependency(%q<redis>, [">= 0"])
      s.add_dependency(%q<mysql>, [">= 0"])
    end
  else
    s.add_dependency(%q<rspec>, [">= 0"])
    s.add_dependency(%q<picky-client>, ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"])
    s.add_dependency(%q<rack>, [">= 0"])
    s.add_dependency(%q<rack_fast_escape>, [">= 0"])
    s.add_dependency(%q<text>, [">= 0"])
    s.add_dependency(%q<yajl-ruby>, [">= 0"])
    s.add_dependency(%q<activesupport>, ["~> 3.0"])
    s.add_dependency(%q<activerecord>, ["~> 3.0"])
    s.add_dependency(%q<unicorn>, [">= 0"])
    s.add_dependency(%q<sinatra>, [">= 0"])
    s.add_dependency(%q<redis>, [">= 0"])
    s.add_dependency(%q<mysql>, [">= 0"])
  end
end

Make ask-backs customizable

Currently, when Picky asks back what the user was searching, the dialog looks is fixed, e.g from gemsearch:
SINATRA # <= A "name" category is printed uppercase
sinatra (using)
sinatra (written by)
Although the "using" and "written by" can be customized it would be perfect to allow
"written by peter" and "using sinatra" or "peter living in england" to create a better flow.

An option in the frontend like category_format (or similar) would be good, looking like:
['name', 'dependency'] => "%s using %",
['dependency', 'name'] => "%s used by %s"

Or even
['*', 'dependency'] => "%s using %s"

That would be perfect.

picky generate empty_unicorn_server doesn't works

I just update picky to 3.2.0

$ gem list picky

*** LOCAL GEMS ***

picky (3.2.0)
picky-client (3.2.0)
picky-generators (3.2.0)

When I run picky generate I can see empty_unicorn_server option is available but picky generate empty_unicorn_server contact_search doesn't works, instead it outputs picky-generate available options

$ picky generate empty_unicorn_server contact_search

Usage:
  picky-generate <project_type> [params]

Possible commands:
  picky-generate client <sinatra_client_name>
  picky-generate server <sinatra_server_name>
  picky-generate sinatra_client <sinatra_client_name>
  picky-generate classic_server <unicorn_server_name>
  picky-generate sinatra_server <sinatra_server_name>
  picky-generate all_in_one <directory_name (use e.g. for Heroku)>

rake index hangs in certain conditions

4 Picky users have reported that when using X indexes, X > 1, Picky hangs while doing the indexing, just after the "indexing using N processors" message.

Use strings for internal keys

When indexes are not static, but realtime, the fact that internal keys are symbols is very problematic: Picky runs out of memory in this case.

It's probably a good idea to use strings by default.

Also, if the data is very similar, and the index is static, then it makes sense to add a static index option to give Picky the possibility to optimize (see #37).

Check all integration cases with all backend types

Basically this:

cases.each do |case|
  backends.each do |backend|
    index = Picky::Index.new case, &case.index
    index.backend backend
    things = Search.new index, &case.search
    get case.url do
      results = things.search params[:query] # etc.
      results.to_json
    end
  end
end

Version 2.0.0

Get version 2.0.0 out of the door :)

  • Check if new history.js is working correctly.
  • Ask if new API is ok.

Make memory Indexes reloadable in running Picky server

Indexes in a running Picky system should be reloadable such that users of Picky don't have to restart the server.
(Yes, it works by restarting a Unicorn, but with a Thin server you're out of luck)

I suggest using a signal to signal the server to reload its indexes without hiccups, i.e. load a copy, then replacing the old one atomic.

Move towards Sinatra-like single file application, app.rb

This would just be a single file, like.

# encoding: utf-8
#
require 'picky'

class BookSearch < Application

  # How text is indexed. Move to Index block to make it index specific.
  #
  indexing removes_characters: /[^a-zA-Z0-9\s\/\-\_\:\"\&\.]/i,
           stopwords:          /\b(and|the|of|it|in|for)\b/i,
           splits_text_on:     /[\s\/\-\_\:\"\&\/]/

  # How query text is preprocessed. Move to Search block to make it search specific.
  #
  searching removes_characters: /[^a-zA-Z0-9\s\/\-\_\,\&\.\"\~\*\:]/i, # Picky needs control chars *"~: to pass through.
            stopwords:          /\b(and|the|of|it|in|for)\b/i

  books_index = Index::Memory.new :books do
    source   Sources::CSV.new(:title, :author, :year, file: "data/#{PICKY_ENVIRONMENT}/library.csv")
    category :title,
             similarity: Similarity::DoubleMetaphone.new(3), # Default is no similarity.
             partial: Partial::Substring.new(from: 1) # Default is from: -3.
    category :author, partial: Partial::Substring.new(from: 1)
    category :year, partial: Partial::None.new
  end

  route %r{\A/books\Z} => Search.new(books_index)

end

# Logging
#
require 'logger'
PickyLog = Loggers::Search.new ::Logger.new(File.expand_path('log/search.log', PICKY_ROOT))

# Index, load and run.
#
Indexes.index
Indexes.load_from_cache

Question - how to enable rake tasks? Not possible without Rakefile, I assume.

"retry" option for Searches

Add a retry option that retries a search with new options IF no results have been found.

Search.new(books) do
  retry do
    ignore :author # When retrying, only uses tokens that match the title
  end
end

Perhaps even optional?

Search.new(some_index) do
  retry on: lambda { |results| results.total < 10 } do
    ignore :that_nonimportant_category
    searching split_on: /\s/
  end
end

What do you think?

Add Redis index backend

Redis has proven to be very fast – up to 30% of the in-memory solution – for Picky in a few preliminary tests.

Indexes would be persistent and server startup times would be fantastically faster (around 1-2s). Also index reloading is basically built-in.

I will work towards adding it in 1.5.0.

"Contract" category headers

Currently, if you search for example for "em json connection" on gemsearch, the resulting header will display "gems using em and using json and using connection".

This is quirky. Joining same categories with a space would make sense in 99% of the cases:
"gems using em json connection".
(Of course there might be search engines using Picky where a space does not denote Token borders, but let's cross that bridge when we come to it)

Dynamic weights

Currently the token weights are statically indexed. However, some search engines do not need the weight and would be happy with a constant weight of zero.

Add a Picky::Weights::None weights generator that does not generate weights, but instead always returns 0 as the weight.

Also, finalize the interface so that people can add e.g. a RandomWeight or similar.

Error message when running `picky generate`

I got error message when running picky generate

Here is the output from my console

$ picky generate
/home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/lib/picky-generators/generators/selector.rb:41:in `rescue in generator_for':  (Picky::Generators::NotFoundException)

Usage:
  picky-generate <project_type> [params]

Possible commands:
  picky-generate client <sinatra_client_name>
  picky-generate server <sinatra_server_name>
  picky-generate sinatra_client <sinatra_client_name>
  picky-generate classic_server <unicorn_server_name>
  picky-generate sinatra_server <sinatra_server_name>
  picky-generate all_in_one <directory_name (use e.g. for Heroku)>

    from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/lib/picky-generators/generators/selector.rb:37:in `generator_for'
    from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/lib/picky-generators/generators/selector.rb:30:in `generate'
    from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/bin/picky-generate:14:in `<top (required)>'
    from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/bin/picky-generate:19:in `load'
    from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/bin/picky-generate:19:in `<main>'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.