Git Product home page Git Product logo

elastic-model's People

Contributors

clintongormley avatar haarg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

elastic-model's Issues

Preload related objects

When retrieving objects from a search, only the source for the object itself is retrieved. Often, these objects will refer to related objects, (eg the user object pointed to by a comment object). If the user knows that they will want to access these, it'd be more efficient to do a multi-get, rather than loading each object one by one

Integrate caching

Should be easy to cache individual documents or Results, Collections etc

Expose id() and type() in doc classes

You can get the id etc from

$doc->uid->id

but it'd be convenient to expose id and type directly in the doc. Add them as ordinary methods to Elastic::Model::Role::Doc so that they can be renamed when importing

Doc nit-pick for Timestamp type

The Elastic::Model::Types Timestamp docs say:

A Timestamp is a Num which holds floating epoch seconds, with milliseconds as decimal places.

The term 'floating' by itself and the phrase 'with milliseconds as decimal places' seem confusing to me. Assuming I understand the intent correctly I'd suggest something like:

A Timestamp is a Num which holds floating point epoch seconds, with milliseconds resolution.

Add a method to retrieve the indexed terms for a field

When developing, it can sometimes be difficult to figure out why a particular query isn't working. Often the reason is that the indexed terms for a particular field are different from you think.

Add terms_indexed_for_field() to Elastic::Model::Role::Doc to aid debugging.

wrong return precedence [cpan #87300]

https://rt.cpan.org/Public/Bug/Display.html?id=87300

returns binds stronger than or, so the expressions after or are ignored.
See https://rt.perl.org/rt3/Public/Bug/Display.html?id=59802

diff -bu ./lib/Elastic/Model/Meta/Class/Doc.pm~ ./lib/Elastic/Model/Meta/Class/Doc.pm
--- ./lib/Elastic/Model/Meta/Class/Doc.pm~  2013-05-08 14:30:06.000000000 -0500
+++ ./lib/Elastic/Model/Meta/Class/Doc.pm   2013-07-25 08:17:09.536402358 -0500
@@ -91,7 +91,7 @@
         . $self->_inline_generate_instance( '$instance',
         '"' . $self->name . '"' )
         . 'return $instance' . '}';
-    return eval($src) or croak $@;
+    return eval($src) || croak $@;
 }

Filterb with "script"

Is there a way to use filterb with script? I can use it with filter but not with filterb. The script is something like this

{
  "script": {
    "script": "doc['field'].values.size()>1"
  }
}

Separate models share data

Model metaclass attributes are being initialized from the same hashref, meaning that data (eg types in a namespace) are shared between independent models.

Make inflators and deflators inlineable

Currently the inflators and deflators use Moose methods at runtime. Instead they should be inlined to improve performance. Also, the inlined flators need to accept non-inlineable coderefs as specified by the user.

FUTURE OF ELASTIC-MODEL - PLEASE COMMENT

Hi all users of Elastic::Model

Elasticsearch 2.0.0 is out, and Elastic::Model doesn't support it. In fact, Elastic::Model doesn't support a number of things from Elasticsearch 1.x either. I apologise for neglecting this module.

My feeling is that Elastic::Model tries to do way too much. Like many frameworks, it ties you into doing things in a particular way, which may or may not make sense for your use case. Most people who use Elastic::Model seem to use a subset of the functionality, and then talk to Elasticsearch directly the rest of the time.

I don't think it makes sense to just update the code for 2.x, it needs a complete rethink

TELL ME HOW YOU USE IT

Please could you add comments to this issue explaining what bits you find useful, what bits you never use, and what bits you find annoying. Perhaps the code can be split out into smaller more useful chunks.

thanks

omit_term_freq_and_positions' is not supported anymore

For any reason elasticsearch rejects to create the index if object type is used.

ElasticsearchParseException['omit_term_freq_and_positions' is not supported anymore - use ['index_options' : 'DOCS_ONLY']  instead]; , called from sub Elasticsearch::Transport::__ANON__ at /root/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Try/Tiny.pm line 76.
package MyApp::Related;

use Elastic::Doc;
use MooseX::Types::Moose qw(Str);

has 'field1' => (
    is    => 'rw',
    isa   => Str, 
);

no Moose::Util::TypeConstraints;
no Elastic::Model;

1;

package MyApp::Parent;

use Elastic::Doc;
use MyApp::Related;

has '_id' => (
    is    => 'ro',
    isa   => Num,
    index => 'not_analyzed',
);

has 'related_items' => (
    is      => 'rw',
    isa     => 'ArrayRef[MyApp::Related]',
    #default => sub {[]},
    #type    => 'object',
);

no Moose::Util::TypeConstraints;
no Elastic::Model;

1;

Clarify undef behaviour for Bool type fields

The Elastic::Manual::Attributes docs say:

Note: By default, ElasticSearch treats undef as NEITHER true NOR false, but as null (ie missing). To work around this, we automatically set "null_value" to 0 to make boolean fields more Perlish. If you would like to revert to the default behaviour, set "null_value" to undef.

I don't understand the "make boolean fields more Perlish" comment, or the need to "work around this" for that matter. Perl has undef specifically to be able to distinguish null/missing from false. So does ElasticSearch. It seems you're fighting against what's natural for both.

Maybe I'm missing something, in which case the docs need expanding to include a more detailed explanation of the issues.

Also, the docs for null_value don't mention the use-case for Bool. In fact they argue against the case made in the Bool docs:

This option is included for completeness, but isn't very useful. Rather just leave the value as undef and use the exists and missing filters when you need to consider undef values.

Elastic::Doc not applying roles properly to extended classes

Using classes which extend other classes as doc classes was not working correctly. The Elastic::Model::Role::Doc role was being applied to the base class, and then removed after the class was extended:

package Foo;
use Elastic::Doc;
extends 'Foo::Other';

Docs for ::View->from should clarify that 0 is the first record

Currently says:

By default, results are returned from the first result. If you would like to start at a later result (eg for paging), you can set "from".

but that doesn't make it clear if the default is 0 or 1. Worth adding that, and maybe noting that 'from' can be thought of as 'number of documents to skip'.

has_changed() and old_values() buggy

The has_changed() and old_values() functionality relies on triggers being called whenever a setter or clearer is called on an attribute.

However, this doesn't work for any complex value (eg hashref/arrayref) as the ref itself might change, without the contents being altered.

Also, these triggers aren't being applied to any attributes included from roles.

Keeping track of changing attributes adds quite a lot of overhead and impacted performance, but we currently have to do it because save() only saves if the object has changed.

Make possible to use async transport for queries

I would like to be able to pass a callback to search methods that will be called once the results object was retrieved and built.

It would probably need some mechanism to pass the async transport class to be used.

private test modules are indexed

In the PAUSE index (02packages.details.txt):

FieldTest                             undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Binary                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Boolean                    undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Date                       undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::GeoPoint                   undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::IP4                        undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Nested                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Number                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Object                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::String                     undef  D/DR/DRTECH/Elastic-Model-0.28.tar.gz

None of these modules should be indexed. You should either declare the full list of indexable modules in META.json using 'provides', split the package declarations of these packages up over two lines, or declare the t/ directory off-limits to indexing using 'no_index' in META.json.

Use objects after partial doc retrieval

Would be nice to be able to select which fields to retrieve and then be able to use the object without the need of a full 'get'. It can probably work in some way of "read only" mode, just retrieve full object when trying to write on it or more fine grained like when trying to access a not full retrieved field. On my planned usage I will never write back as this is for search results only, so setting it as read only and raise exception when trying to use other way is fine for me.

Another approach can be to let the class define a constructor that will be in charge of parsing partial result... but automatic functioning is much better and probably doable as it's always the same.

Of course, required attributes should have default values if you plan to selectively not retrieve one of those.

One problem I see is when selecting partial data from a hash returned data is in dot notation, so it should be converted back to a hash before inflation on partial objects.

I see this feature very interesting for online search over docs that can have lot of data with the only purpose of searching/filtering.

Make result sets cacheable

Elastic::Model::Results objects include coderefs and cached objects which are not cacheable.

Add a to_cache() method to return just the data required to reinflate a resultset

Performance of Elastic::Model::UID::new_from_store re Moose type constraints

On my system (perl 5.10) each call to Elastic::Model::UID::new_from_store takes 1.36ms.

That adds up fast.

Much of the cost is in Moose::Meta::TypeConstraint checking.

I see the Elastic::Model::UID doesn't use PACKAGE->meta->make_immutable;
That may help (here and in other packages).

Also, perhaps you'd consider dropping the Str and/or Maybe[Str] 'isa' declarations since they add relatively little value.

Bug in the indexing of UIDs

The current mapping for UIDs uses a path of just_name and an index_name like uid.index. This means that all Doc class attributes in an object are being indexed as the same fields, so it is impossible to distinguish a search for (mother => $user) from (father => $user).

Add support for bulk indexing

Add Elastic::Model::Bulk to save/overwrite documents in batches for improved performance:

$bulk = $model->bulk(
    size        => 1000,
    on_conflict => sub {...},
    on_error    => sub {...}
);

$bulk->save($doc);
$bulk->overwrite($doc);
...

$bulk->commit;

Check for routing changes when reindexing

Reindexing docs may also involve a new routing scheme for the new index (or even changes in type or ID). This should be accounted for when updating the UID references in other indexes. Currently we only update the index name.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.