clintongormley / elastic-model Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 8.0 1.04 MB

Use ElasticSearch as a NoSQL database in Perl

Perl 90.16% Perl 6 9.84%

elastic-model's People

Contributors

Stargazers

Watchers

Forkers

haarg felliott amiri ubu manwar diegok jiangliguo

elastic-model's Issues

Add a percolate API and an on_percolate callback

Preload related objects

When retrieving objects from a search, only the source for the object itself is retrieved. Often, these objects will refer to related objects, (eg the user object pointed to by a comment object). If the user knows that they will want to access these, it'd be more efficient to do a multi-get, rather than loading each object one by one

Add Result support for matched_filters

Elastic::Model::Result should be able to return the names of the matched_filters

Can't get namespace from a model subclass

Hi!

I don't know if this is an expected behaviour: the "namespace" method called in a subclass of a model always returns "undef".

Here you can find a minimal script in order to reproduce:
https://gist.github.com/miquelruiz/8147660

Is there any workaround (or fix) in order to make this work?

Thanks in advance,
Miquel Ruiz

Integrate caching

Should be easy to cache individual documents or Results, Collections etc

Default values for wrapper/multi_wrapper in Iterator incorrect

The default values for wrapper() and multi_wrapper() are incorrect - they should return CODE refs

Expose id() and type() in doc classes

You can get the id etc from

$doc->uid->id

but it'd be convenient to expose id and type directly in the doc. Add them as ordinary methods to Elastic::Model::Role::Doc so that they can be renamed when importing

Doc nit-pick for Timestamp type

The Elastic::Model::Types Timestamp docs say:

A Timestamp is a Num which holds floating epoch seconds, with milliseconds as decimal places.

The term 'floating' by itself and the phrase 'with milliseconds as decimal places' seem confusing to me. Assuming I understand the intent correctly I'd suggest something like:

A Timestamp is a Num which holds floating point epoch seconds, with milliseconds resolution.

Undeclared dependencies

See http://analysis.cpantesters.org/solved?distv=Elastic-Model-0.52#qr%3A%28Can%27t%20locate%20%5CS%2Bpm%29 : it seems that Any::URI::Escape and Search::Elasticsearch::Client::1_0::Direct have to be declared as prerequisites.

Add a method to retrieve the indexed terms for a field

When developing, it can sometimes be difficult to figure out why a particular query isn't working. Often the reason is that the indexed terms for a particular field are different from you think.

Add terms_indexed_for_field() to Elastic::Model::Role::Doc to aid debugging.

wrong return precedence [cpan #87300]

https://rt.cpan.org/Public/Bug/Display.html?id=87300

returns binds stronger than or, so the expressions after or are ignored.
See https://rt.perl.org/rt3/Public/Bug/Display.html?id=59802

diff -bu ./lib/Elastic/Model/Meta/Class/Doc.pm~ ./lib/Elastic/Model/Meta/Class/Doc.pm
--- ./lib/Elastic/Model/Meta/Class/Doc.pm~  2013-05-08 14:30:06.000000000 -0500
+++ ./lib/Elastic/Model/Meta/Class/Doc.pm   2013-07-25 08:17:09.536402358 -0500
@@ -91,7 +91,7 @@
         . $self->_inline_generate_instance( '$instance',
         '"' . $self->name . '"' )
         . 'return $instance' . '}';
-    return eval($src) or croak $@;
+    return eval($src) || croak $@;
 }

Filterb with "script"

Is there a way to use filterb with script? I can use it with filter but not with filterb. The script is something like this

{
  "script": {
    "script": "doc['field'].values.size()>1"
  }
}

Separate models share data

Model metaclass attributes are being initialized from the same hashref, meaning that data (eg types in a namespace) are shared between independent models.

Make inflators and deflators inlineable

Currently the inflators and deflators use Moose methods at runtime. Instead they should be inlined to improve performance. Also, the inlined flators need to accept non-inlineable coderefs as specified by the user.

FUTURE OF ELASTIC-MODEL - PLEASE COMMENT

Hi all users of Elastic::Model

Elasticsearch 2.0.0 is out, and Elastic::Model doesn't support it. In fact, Elastic::Model doesn't support a number of things from Elasticsearch 1.x either. I apologise for neglecting this module.

My feeling is that Elastic::Model tries to do way too much. Like many frameworks, it ties you into doing things in a particular way, which may or may not make sense for your use case. Most people who use Elastic::Model seem to use a subset of the functionality, and then talk to Elasticsearch directly the rest of the time.

I don't think it makes sense to just update the code for 2.x, it needs a complete rethink

TELL ME HOW YOU USE IT

Please could you add comments to this issue explaining what bits you find useful, what bits you never use, and what bits you find annoying. Perhaps the code can be split out into smaller more useful chunks.

thanks

Is Elastic::Model fully compatible with elasticsearch 1.x?

I notice that Elastic::Model 'es' attribute isa Search::Elasticsearch::Compat. Are there any caveats using the these two distributions with elasticsearch 1.x series?

Tests fail (with Search::Elasticsearch 2.00?)

See subject. Statistical analysis suggests that the failures are caused by Search::Elasticsearch 2.00; 1.99 and earlier are OK: http://analysis.cpantesters.org/reports_by_field?distv=Elastic-Model-0.51;field=mod%3ASearch%3A%3AElasticsearch

omit_term_freq_and_positions' is not supported anymore

For any reason elasticsearch rejects to create the index if object type is used.

ElasticsearchParseException['omit_term_freq_and_positions' is not supported anymore - use ['index_options' : 'DOCS_ONLY']  instead]; , called from sub Elasticsearch::Transport::__ANON__ at /root/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Try/Tiny.pm line 76.

package MyApp::Related;

use Elastic::Doc;
use MooseX::Types::Moose qw(Str);

has 'field1' => (
    is    => 'rw',
    isa   => Str, 
);

no Moose::Util::TypeConstraints;
no Elastic::Model;

1;

package MyApp::Parent;

use Elastic::Doc;
use MyApp::Related;

has '_id' => (
    is    => 'ro',
    isa   => Num,
    index => 'not_analyzed',
);

has 'related_items' => (
    is      => 'rw',
    isa     => 'ArrayRef[MyApp::Related]',
    #default => sub {[]},
    #type    => 'object',
);

no Moose::Util::TypeConstraints;
no Elastic::Model;

1;

Handle empty queryb/filterb args more gracefully

If the args to queryb/filterb result in an empty clause, should be able to simply ignore them, rather
than throwing an error

Clarify undef behaviour for Bool type fields

The Elastic::Manual::Attributes docs say:

Note: By default, ElasticSearch treats undef as NEITHER true NOR false, but as null (ie missing). To work around this, we automatically set "null_value" to 0 to make boolean fields more Perlish. If you would like to revert to the default behaviour, set "null_value" to undef.

I don't understand the "make boolean fields more Perlish" comment, or the need to "work around this" for that matter. Perl has undef specifically to be able to distinguish null/missing from false. So does ElasticSearch. It seems you're fighting against what's natural for both.

Maybe I'm missing something, in which case the docs need expanding to include a more detailed explanation of the issues.

Also, the docs for null_value don't mention the use-case for Bool. In fact they argue against the case made in the Bool docs:

This option is included for completeness, but isn't very useful. Rather just leave the value as undef and use the exists and missing filters when you need to consider undef values.

Elastic::Doc not applying roles properly to extended classes

Using classes which extend other classes as doc classes was not working correctly. The Elastic::Model::Role::Doc role was being applied to the base class, and then removed after the class was extended:

package Foo;
use Elastic::Doc;
extends 'Foo::Other';

Docs for ::View->from should clarify that 0 is the first record

Currently says:

By default, results are returned from the first result. If you would like to start at a later result (eg for paging), you can set "from".

but that doesn't make it clear if the default is 0 or 1. Worth adding that, and maybe noting that 'from' can be thought of as 'number of documents to skip'.

UIDs from search results missing routing

The UIDs of objects or results returned from search results are missing the routing value

add 'ignore_missing' to delete_mapping

Was doing 'delete_mapping' in a situation in which the mapping didn't already exist. Switched to using ElasticSearch, for 'ignore_missing => 1'.

has_changed() and old_values() buggy

The has_changed() and old_values() functionality relies on triggers being called whenever a setter or clearer is called on an attribute.

However, this doesn't work for any complex value (eg hashref/arrayref) as the ref itself might change, without the contents being altered.

Also, these triggers aren't being applied to any attributes included from roles.

Keeping track of changing attributes adds quite a lot of overhead and impacted performance, but we currently have to do it because save() only saves if the object has changed.

Make possible to use async transport for queries

I would like to be able to pass a callback to search methods that will be called once the results object was retrieved and built.

It would probably need some mechanism to pass the async transport class to be used.

private test modules are indexed

In the PAUSE index (02packages.details.txt):

FieldTest                             undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Binary                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Boolean                    undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Date                       undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::GeoPoint                   undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::IP4                        undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Nested                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Number                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Object                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::String                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz

None of these modules should be indexed. You should either declare the full list of indexable modules in META.json using 'provides', split the package declarations of these packages up over two lines, or declare the t/ directory off-limits to indexing using 'no_index' in META.json.

Make it easier to configure attributes imported from roles

When importing attributes from roles, if you don't have access to the role, all attributes will be
indexed using the default settings. Provide a way to override the settings for applied role attributes

cpanratings.perl.org no longer exists

This distribution has a link to CPAN Ratings at cpanratings.perl.org, but this address now redirects to https://metacpan.org/

Tests fail with Can't locate object method "add_namespace" via package "Moose::Meta::Class"

Looks like you need to be more specific about Moose versions in your PREREQs.

We're pre-2.0 still (1.25).

Use objects after partial doc retrieval

Would be nice to be able to select which fields to retrieve and then be able to use the object without the need of a full 'get'. It can probably work in some way of "read only" mode, just retrieve full object when trying to write on it or more fine grained like when trying to access a not full retrieved field. On my planned usage I will never write back as this is for search results only, so setting it as read only and raise exception when trying to use other way is fine for me.

Another approach can be to let the class define a constructor that will be in charge of parsing partial result... but automatic functioning is much better and probably doable as it's always the same.

Of course, required attributes should have default values if you plan to selectively not retrieve one of those.

One problem I see is when selecting partial data from a hash returned data is in dot notation, so it should be converted back to a hash before inflation on partial objects.

I see this feature very interesting for online search over docs that can have lot of data with the only purpose of searching/filtering.

Make result sets cacheable

Elastic::Model::Results objects include coderefs and cached objects which are not cacheable.

Add a to_cache() method to return just the data required to reinflate a resultset

Performance of Elastic::Model::UID::new_from_store re Moose type constraints

On my system (perl 5.10) each call to Elastic::Model::UID::new_from_store takes 1.36ms.

That adds up fast.

Much of the cost is in Moose::Meta::TypeConstraint checking.

I see the Elastic::Model::UID doesn't use PACKAGE->meta->make_immutable;
That may help (here and in other packages).

Also, perhaps you'd consider dropping the Str and/or Maybe[Str] 'isa' declarations since they add relatively little value.

Fails with ElasticSearch::SearchBuilder 0.18

Sample fail report:

http://www.cpantesters.org/cpan/report/25678059

Thanks,

Bug in the indexing of UIDs

The current mapping for UIDs uses a path of just_name and an index_name like uid.index. This means that all Doc class attributes in an object are being indexed as the same fields, so it is impossible to distinguish a search for (mother => $user) from (father => $user).

Add support for analyzer aliases and _analyzer field

http://www.elasticsearch.org/guide/reference/index-modules/analysis/

Add support for bulk indexing

Add Elastic::Model::Bulk to save/overwrite documents in batches for improved performance:

$bulk = $model->bulk(
    size        => 1000,
    on_conflict => sub {...},
    on_error    => sub {...}
);

$bulk->save($doc);
$bulk->overwrite($doc);
...

$bulk->commit;

Check for routing changes when reindexing

Reindexing docs may also involve a new routing scheme for the new index (or even changes in type or ID). This should be accounted for when updating the UID references in other indexes. Currently we only update the index name.