floere / picky Goto Github PK
View Code? Open in Web Editor NEWPicky is an easy to use and fast Ruby semantic search engine that helps your users find what they are looking for.
Home Page: http://pickyrb.com
License: Other
Picky is an easy to use and fast Ruby semantic search engine that helps your users find what they are looking for.
Home Page: http://pickyrb.com
License: Other
When the index is static, it is advisable to use symbols internally.
However, when also removing entries from the index using index.remove(id)
, it is better to use Strings, as Symbols never get freed.
Consider adding a hint to the index definition as follows:
Picky::Index.new(:dynamic) do
static
end
or
Picky::Index.new(:dynamic) do
unchanging
end
or
Picky::Index.new(:dynamic) do
static true
end
Also think about what the default should be. I tend towards dynamic indexes. And let the user add a helpful hint to optimize in case it's static.
It's theoretically impossible to determine at runtime what to do, so we need the user input.
Hi,
if I understand correctly activesupport is a picky-client dependency and for this reason it should be specified in picky-clinet.gemspec file. Now because there is a "rescue LoadError" clause (https://github.com/floere/picky/blob/master/client/lib/picky-client/client.rb#L123, which I don't understand) it can cause misleading "undedfined method to_query
" error if somebody uses bundler for example.
Also there is no need to define Hash#to_query method since it is required by required 'active_support/core_ext/object/to_query' (https://github.com/rails/rails/blob/master/activesupport/lib/active_support/core_ext/object/to_query.rb#L1).
Please take a look at my commit: Stanley@d10a182 and see if it makes any sense.
Best,
Stan
See
http://www.spoiledmilk.dk/blog/?p=1922
for more info.
Currently, Infix and Substring are available. Infix provides true Infix capabilities, while substring behaves as (similar as) most substring implementations behave.
Postfix and Prefix would be special implementations of Substring:
Postfix: Substring with fixed to: -1
Prefix: Substring with fixed from: 1, and reverse "partializing".
Note: Prefix is delayed until somebody actually needs it.
Currently we have for similarity:
Similarity::DoubleMetaphone.new(amount_similar)
Similarity::Metaphone.new(amount_similar)
Similarity::Soundex.new(amount_similar)
And for partial searches:
Partial::Substring.new(from: index_pos1, to: index_pos2)
Do we need more, what do you think? Do you have a need for more or even written one?
Especially annoying in the generated example. Perhaps wait until history.js stabilizes and go back to an older version?
For example, searching for
summary:ios
results in a group header
pods with summary summary:ios*
being displayed.
It should display
pods with summary ios*
with the qualifier removed.
Move the terminal search to the client and call it from the picky binary.
Add a browser back/forward feature to Picky's javascript client such that:
Naw descwipshun gifen.
Throughout index/category/bundle/backend.
These are convenience methods that can be included again when someone needs them. At the moment, they represent maintenance effort for nothing.
People love numbers. Without numbers to quote, they feel like garbage.
Let's give them numbers. Many many numbers. Grouped in groups. Compared to others. Multiplied, added, subtracted.
"We need numbers. Lots of numbers." -- Neo, in "The Actual Matrix"
Create an option include/exclude:
Search.new(books) do
ignore :isbn
end
In the first example, any token being matched to category :isbn would be ignored.
Search.new(advertisements) do
only_use :city, :zipcode
# perhaps call ignore_unless
end
In the second example, ONLY tokens being matched to category :city or category :zipcode would survive. Tokens matching e.g. category :name would be ignored.
Why is this necessary?
Sometimes users don't want all categories used in their searches, or want a broader search on certain indexes where some tokens are just ignored.
One example is an advertisement search that is coupled to an address search. Tokens matching the name are ignored, while Tokens matching a zipcode are used.
What do you think?
Use a matrix of:
and various backends
and queries that are
Then automatically run 48 tests, one after another, to see how each of the backends performs.
Code from that commit:
eval(ENV['CLASS'].to_s).import(params) do |total, done|
# I CAN HAZ PROGREZ BAR LIEK HOMEBRU!
percent = ( (done.to_f / total) * 100 ).to_i
STDOUT.print( ("#" * ( percent*((cols-4).to_f/100)).to_i )+" ")
STDOUT.print("\r"*cols+"#{percent}% ")
end
Note: Like the total, done block there.
After installing Picky 3.0, generating a classic_server and indexing, most rake tasks (analyze, for example) error out.
Example:
roger@roger-MS-7621:~/temp/test$ bundle exec rake analyze
Loaded picky with environment 'development' in /home/roger/temp/test on Ruby 1.9.2.
Application BookSearch loaded.
rake aborted!
uninitialized constant Object::IndexesTasks: TOP => analyze
(See full trace by running task with --trace)
It seems to miss an "include Picky".
Data: 3 W
Query: W
Problem: the item with the data doesn't show up
Works as expected with Partial::Substring.new(from: -1)
.
Expand the API such that this becomes possible:
Index::Memory.new(:index_specific_indexing) do
source Sources::CSV.new(:title, file: 'data/books.csv')
indexing removes_characters: /[^äöüd-zD-Z0-9\s\/\-\"\&\.]/i, # a-c, A-C are removed
splits_text_on: /[\s\/\-\"\&\/]/
category :title,
qualifiers: [:t, :title, :titre],
partial: Partial::Substring.new(from: 1),
similarity: Similarity::DoubleMetaphone.new(2)
end
Also, rename
default_indexing
default_querying
to
indexing
searching
.
The case:
context 'fun cases' do
it 'stopwords destroy ids (final finding: id referenced also on attribute)' do
index = Picky::Index.new :stopwords do
key_format :to_sym
indexing stopwords: /and/
category :name
end
referenced = "this and that"
require 'ostruct'
thing = OpenStruct.new id: referenced, name: referenced
index.add thing
try = Picky::Search.new index
try.search("this").ids.should == ["this and that"] # Fails. It's ["this that"].
end
end
I still need to finalize the JavaScript API and internals.
Provide them to the users?
picky-client install javascripts?
-> copy to public/javascripts by default? Or ask if the dir is not there?
Issue entered by http://github.com/agrimm on gemsearch: "I half expected that clicking on the colored number would make it display the results."
More?
I am running 1.9.2 with RVM, RubyGems is updated to 1.8.11 and I get the errors below when installing Picky. It seems to be a problem with Syck requiring "=" to be escaped, see igrigorik/em-websocket#65
WARNING: #<ArgumentError: Illformed requirement ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"]>
# -*- encoding: utf-8 -*-
Gem::Specification.new do |s|
s.name = "picky"
s.version = "3.3.2"
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
s.authors = ["Florian Hanke"]
s.date = "2011-11-02"
s.description = "Fast Ruby semantic text search engine with comfortable single field interface."
s.email = "[email protected]"
s.executables = ["picky"]
s.extensions = ["lib/picky/ext/ruby19/extconf.rb"]
s.files = ["bin/picky", "lib/picky/ext/ruby19/extconf.rb"]
s.homepage = "http://florianhanke.com/picky"
s.require_paths = ["lib"]
s.rubyforge_project = "http://rubyforge.org/projects/picky"
s.rubygems_version = "1.8.11"
s.summary = "Picky: Semantic Search Engine. Clever Interface. Good Tools."
if s.respond_to? :specification_version then
s.specification_version = 3
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
s.add_development_dependency(%q<rspec>, [">= 0"])
s.add_development_dependency(%q<picky-client>, ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"])
s.add_runtime_dependency(%q<rack>, [">= 0"])
s.add_runtime_dependency(%q<rack_fast_escape>, [">= 0"])
s.add_runtime_dependency(%q<text>, [">= 0"])
s.add_runtime_dependency(%q<yajl-ruby>, [">= 0"])
s.add_runtime_dependency(%q<activesupport>, ["~> 3.0"])
s.add_runtime_dependency(%q<activerecord>, ["~> 3.0"])
s.add_runtime_dependency(%q<unicorn>, [">= 0"])
s.add_runtime_dependency(%q<sinatra>, [">= 0"])
s.add_runtime_dependency(%q<redis>, [">= 0"])
s.add_runtime_dependency(%q<mysql>, [">= 0"])
else
s.add_dependency(%q<rspec>, [">= 0"])
s.add_dependency(%q<picky-client>, ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"])
s.add_dependency(%q<rack>, [">= 0"])
s.add_dependency(%q<rack_fast_escape>, [">= 0"])
s.add_dependency(%q<text>, [">= 0"])
s.add_dependency(%q<yajl-ruby>, [">= 0"])
s.add_dependency(%q<activesupport>, ["~> 3.0"])
s.add_dependency(%q<activerecord>, ["~> 3.0"])
s.add_dependency(%q<unicorn>, [">= 0"])
s.add_dependency(%q<sinatra>, [">= 0"])
s.add_dependency(%q<redis>, [">= 0"])
s.add_dependency(%q<mysql>, [">= 0"])
end
else
s.add_dependency(%q<rspec>, [">= 0"])
s.add_dependency(%q<picky-client>, ["#<Syck::DefaultKey:0x1f33b84> 3.3.2"])
s.add_dependency(%q<rack>, [">= 0"])
s.add_dependency(%q<rack_fast_escape>, [">= 0"])
s.add_dependency(%q<text>, [">= 0"])
s.add_dependency(%q<yajl-ruby>, [">= 0"])
s.add_dependency(%q<activesupport>, ["~> 3.0"])
s.add_dependency(%q<activerecord>, ["~> 3.0"])
s.add_dependency(%q<unicorn>, [">= 0"])
s.add_dependency(%q<sinatra>, [">= 0"])
s.add_dependency(%q<redis>, [">= 0"])
s.add_dependency(%q<mysql>, [">= 0"])
end
end
This "issue" is here to stay, where all can muse about where Picky is and should go.
When a Javascript file changes, how do we send it out to users?
index = Index::Memory.new :bla do
category :some
end
route %r{\A/indexing/bla\z} => index
Currently, when Picky asks back what the user was searching, the dialog looks is fixed, e.g from gemsearch:
SINATRA # <= A "name" category is printed uppercase
sinatra (using)
sinatra (written by)
Although the "using" and "written by" can be customized it would be perfect to allow
"written by peter" and "using sinatra" or "peter living in england" to create a better flow.
An option in the frontend like category_format (or similar) would be good, looking like:
['name', 'dependency'] => "%s using %",
['dependency', 'name'] => "%s used by %s"
Or even
['*', 'dependency'] => "%s using %s"
That would be perfect.
Currently, the Redis index makes a roundtrip to get the results after a calculation. Using the new Lua scripting, we could do this in one request.
See the command and the blog post:
http://redis.io/commands/eval
http://antirez.com/post/short-term-redis-plans.html
Implement this as soon as Redis 2.6.0 is available.
If possible, check for the Redis version and choose the implementation transparently. Use http://redis.io/commands/info to get the version.
I just update picky to 3.2.0
$ gem list picky
*** LOCAL GEMS ***
picky (3.2.0)
picky-client (3.2.0)
picky-generators (3.2.0)
When I run picky generate
I can see empty_unicorn_server
option is available but picky generate empty_unicorn_server contact_search
doesn't works, instead it outputs picky-generate
available options
$ picky generate empty_unicorn_server contact_search
Usage:
picky-generate <project_type> [params]
Possible commands:
picky-generate client <sinatra_client_name>
picky-generate server <sinatra_server_name>
picky-generate sinatra_client <sinatra_client_name>
picky-generate classic_server <unicorn_server_name>
picky-generate sinatra_server <sinatra_server_name>
picky-generate all_in_one <directory_name (use e.g. for Heroku)>
Searching for
title,author:a
results in choices.
When the user clicks on the author choice, the Picky frontend tries to search for
author:author,title:a
4 Picky users have reported that when using X indexes, X > 1, Picky hangs while doing the indexing, just after the "indexing using N processors" message.
When indexes are not static, but realtime, the fact that internal keys are symbols is very problematic: Picky runs out of memory in this case.
It's probably a good idea to use strings by default.
Also, if the data is very similar, and the index is static, then it makes sense to add a static
index option to give Picky the possibility to optimize (see #37).
Basically this:
cases.each do |case|
backends.each do |backend|
index = Picky::Index.new case, &case.index
index.backend backend
things = Search.new index, &case.search
get case.url do
results = things.search params[:query] # etc.
results.to_json
end
end
end
Get version 2.0.0 out of the door :)
Indexes in a running Picky system should be reloadable such that users of Picky don't have to restart the server.
(Yes, it works by restarting a Unicorn, but with a Thin server you're out of luck)
I suggest using a signal to signal the server to reload its indexes without hiccups, i.e. load a copy, then replacing the old one atomic.
This would just be a single file, like.
# encoding: utf-8
#
require 'picky'
class BookSearch < Application
# How text is indexed. Move to Index block to make it index specific.
#
indexing removes_characters: /[^a-zA-Z0-9\s\/\-\_\:\"\&\.]/i,
stopwords: /\b(and|the|of|it|in|for)\b/i,
splits_text_on: /[\s\/\-\_\:\"\&\/]/
# How query text is preprocessed. Move to Search block to make it search specific.
#
searching removes_characters: /[^a-zA-Z0-9\s\/\-\_\,\&\.\"\~\*\:]/i, # Picky needs control chars *"~: to pass through.
stopwords: /\b(and|the|of|it|in|for)\b/i
books_index = Index::Memory.new :books do
source Sources::CSV.new(:title, :author, :year, file: "data/#{PICKY_ENVIRONMENT}/library.csv")
category :title,
similarity: Similarity::DoubleMetaphone.new(3), # Default is no similarity.
partial: Partial::Substring.new(from: 1) # Default is from: -3.
category :author, partial: Partial::Substring.new(from: 1)
category :year, partial: Partial::None.new
end
route %r{\A/books\Z} => Search.new(books_index)
end
# Logging
#
require 'logger'
PickyLog = Loggers::Search.new ::Logger.new(File.expand_path('log/search.log', PICKY_ROOT))
# Index, load and run.
#
Indexes.index
Indexes.load_from_cache
Question - how to enable rake tasks? Not possible without Rakefile, I assume.
Add a retry option that retries a search with new options IF no results have been found.
Search.new(books) do
retry do
ignore :author # When retrying, only uses tokens that match the title
end
end
Perhaps even optional?
Search.new(some_index) do
retry on: lambda { |results| results.total < 10 } do
ignore :that_nonimportant_category
searching split_on: /\s/
end
end
What do you think?
Redis has proven to be very fast – up to 30% of the in-memory solution – for Picky in a few preliminary tests.
Indexes would be persistent and server startup times would be fantastically faster (around 1-2s). Also index reloading is basically built-in.
I will work towards adding it in 1.5.0.
Currently, if you search for example for "em json connection" on gemsearch, the resulting header will display "gems using em and using json and using connection".
This is quirky. Joining same categories with a space would make sense in 99% of the cases:
"gems using em json connection".
(Of course there might be search engines using Picky where a space does not denote Token borders, but let's cross that bridge when we come to it)
When I try to index japanese, the indexing seems to work fine but yajl gives an error when loading. A gist with a test case and the error for this can be found at https://gist.github.com/1129296.
Cheers, Picky
Update: I CAN! OCTOHUGS! (From MacRuby 0.12 on)
Currently the token weights are statically indexed. However, some search engines do not need the weight and would be happy with a constant weight of zero.
Add a Picky::Weights::None
weights generator that does not generate weights, but instead always returns 0 as the weight.
Also, finalize the interface so that people can add e.g. a RandomWeight
or similar.
require 'picky'
raises "Bundler::GemfileNotFound: Could not locate Gemfile"
I got error message when running picky generate
Here is the output from my console
$ picky generate
/home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/lib/picky-generators/generators/selector.rb:41:in `rescue in generator_for': (Picky::Generators::NotFoundException)
Usage:
picky-generate <project_type> [params]
Possible commands:
picky-generate client <sinatra_client_name>
picky-generate server <sinatra_server_name>
picky-generate sinatra_client <sinatra_client_name>
picky-generate classic_server <unicorn_server_name>
picky-generate sinatra_server <sinatra_server_name>
picky-generate all_in_one <directory_name (use e.g. for Heroku)>
from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/lib/picky-generators/generators/selector.rb:37:in `generator_for'
from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/lib/picky-generators/generators/selector.rb:30:in `generate'
from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/gems/picky-generators-3.1.11/bin/picky-generate:14:in `<top (required)>'
from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/bin/picky-generate:19:in `load'
from /home/william/.rvm/gems/ruby-1.9.2-p290@blog/bin/picky-generate:19:in `<main>'
Add a to_msgpack option for the speed freaks?
http://msgpack.org/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.