clintongormley / elastic-model Goto Github PK
View Code? Open in Web Editor NEWUse ElasticSearch as a NoSQL database in Perl
Use ElasticSearch as a NoSQL database in Perl
When retrieving objects from a search, only the source for the object itself is retrieved. Often, these objects will refer to related objects, (eg the user object pointed to by a comment object). If the user knows that they will want to access these, it'd be more efficient to do a multi-get, rather than loading each object one by one
Elastic::Model::Result should be able to return the names of the matched_filters
Hi!
I don't know if this is an expected behaviour: the "namespace" method called in a subclass of a model always returns "undef".
Here you can find a minimal script in order to reproduce:
https://gist.github.com/miquelruiz/8147660
Is there any workaround (or fix) in order to make this work?
Thanks in advance,
Miquel Ruiz
Should be easy to cache individual documents or Results, Collections etc
The default values for wrapper()
and multi_wrapper()
are incorrect - they should return CODE refs
You can get the id etc from
$doc->uid->id
but it'd be convenient to expose id and type directly in the doc. Add them as ordinary methods to Elastic::Model::Role::Doc so that they can be renamed when importing
The Elastic::Model::Types Timestamp docs say:
A Timestamp is a Num which holds floating epoch seconds, with milliseconds as decimal places.
The term 'floating' by itself and the phrase 'with milliseconds as decimal places' seem confusing to me. Assuming I understand the intent correctly I'd suggest something like:
A Timestamp is a Num which holds floating point epoch seconds, with milliseconds resolution.
See http://analysis.cpantesters.org/solved?distv=Elastic-Model-0.52#qr%3A%28Can%27t%20locate%20%5CS%2Bpm%29 : it seems that Any::URI::Escape and Search::Elasticsearch::Client::1_0::Direct have to be declared as prerequisites.
When developing, it can sometimes be difficult to figure out why a particular query isn't working. Often the reason is that the indexed terms for a particular field are different from you think.
Add terms_indexed_for_field()
to Elastic::Model::Role::Doc to aid debugging.
https://rt.cpan.org/Public/Bug/Display.html?id=87300
returns binds stronger than or, so the expressions after or are ignored.
See https://rt.perl.org/rt3/Public/Bug/Display.html?id=59802
diff -bu ./lib/Elastic/Model/Meta/Class/Doc.pm~ ./lib/Elastic/Model/Meta/Class/Doc.pm
--- ./lib/Elastic/Model/Meta/Class/Doc.pm~ 2013-05-08 14:30:06.000000000 -0500
+++ ./lib/Elastic/Model/Meta/Class/Doc.pm 2013-07-25 08:17:09.536402358 -0500
@@ -91,7 +91,7 @@
. $self->_inline_generate_instance( '$instance',
'"' . $self->name . '"' )
. 'return $instance' . '}';
- return eval($src) or croak $@;
+ return eval($src) || croak $@;
}
Is there a way to use filterb
with script
? I can use it with filter
but not with filterb
. The script is something like this
{
"script": {
"script": "doc['field'].values.size()>1"
}
}
Model metaclass attributes are being initialized from the same hashref, meaning that data (eg types in a namespace) are shared between independent models.
Currently the inflators and deflators use Moose methods at runtime. Instead they should be inlined to improve performance. Also, the inlined flators need to accept non-inlineable coderefs as specified by the user.
Hi all users of Elastic::Model
Elasticsearch 2.0.0 is out, and Elastic::Model doesn't support it. In fact, Elastic::Model doesn't support a number of things from Elasticsearch 1.x either. I apologise for neglecting this module.
My feeling is that Elastic::Model tries to do way too much. Like many frameworks, it ties you into doing things in a particular way, which may or may not make sense for your use case. Most people who use Elastic::Model seem to use a subset of the functionality, and then talk to Elasticsearch directly the rest of the time.
I don't think it makes sense to just update the code for 2.x, it needs a complete rethink
Please could you add comments to this issue explaining what bits you find useful, what bits you never use, and what bits you find annoying. Perhaps the code can be split out into smaller more useful chunks.
thanks
I notice that Elastic::Model 'es' attribute isa Search::Elasticsearch::Compat. Are there any caveats using the these two distributions with elasticsearch 1.x series?
See subject. Statistical analysis suggests that the failures are caused by Search::Elasticsearch 2.00; 1.99 and earlier are OK: http://analysis.cpantesters.org/reports_by_field?distv=Elastic-Model-0.51;field=mod%3ASearch%3A%3AElasticsearch
For any reason elasticsearch rejects to create the index if object type is used.
ElasticsearchParseException['omit_term_freq_and_positions' is not supported anymore - use ['index_options' : 'DOCS_ONLY'] instead]; , called from sub Elasticsearch::Transport::__ANON__ at /root/perl5/perlbrew/perls/perl-5.16.2/lib/site_perl/5.16.2/Try/Tiny.pm line 76.
package MyApp::Related;
use Elastic::Doc;
use MooseX::Types::Moose qw(Str);
has 'field1' => (
is => 'rw',
isa => Str,
);
no Moose::Util::TypeConstraints;
no Elastic::Model;
1;
package MyApp::Parent;
use Elastic::Doc;
use MyApp::Related;
has '_id' => (
is => 'ro',
isa => Num,
index => 'not_analyzed',
);
has 'related_items' => (
is => 'rw',
isa => 'ArrayRef[MyApp::Related]',
#default => sub {[]},
#type => 'object',
);
no Moose::Util::TypeConstraints;
no Elastic::Model;
1;
If the args to queryb
/filterb
result in an empty clause, should be able to simply ignore them, rather
than throwing an error
The Elastic::Manual::Attributes docs say:
Note: By default, ElasticSearch treats undef as NEITHER true NOR false, but as null (ie missing). To work around this, we automatically set "null_value" to 0 to make boolean fields more Perlish. If you would like to revert to the default behaviour, set "null_value" to undef.
I don't understand the "make boolean fields more Perlish" comment, or the need to "work around this" for that matter. Perl has undef specifically to be able to distinguish null/missing from false. So does ElasticSearch. It seems you're fighting against what's natural for both.
Maybe I'm missing something, in which case the docs need expanding to include a more detailed explanation of the issues.
Also, the docs for null_value don't mention the use-case for Bool. In fact they argue against the case made in the Bool docs:
This option is included for completeness, but isn't very useful. Rather just leave the value as undef and use the exists and missing filters when you need to consider undef values.
Using classes which extend other classes as doc classes was not working correctly. The Elastic::Model::Role::Doc
role was being applied to the base class, and then removed after the class was extend
ed:
package Foo;
use Elastic::Doc;
extends 'Foo::Other';
Currently says:
By default, results are returned from the first result. If you would like to start at a later result (eg for paging), you can set "from".
but that doesn't make it clear if the default is 0 or 1. Worth adding that, and maybe noting that 'from' can be thought of as 'number of documents to skip'.
The UIDs of objects or results returned from search results are missing the routing value
Was doing 'delete_mapping' in a situation in which the mapping didn't already exist. Switched to using ElasticSearch, for 'ignore_missing => 1'.
The has_changed()
and old_values()
functionality relies on triggers being called whenever a setter or clearer is called on an attribute.
However, this doesn't work for any complex value (eg hashref/arrayref) as the ref itself might change, without the contents being altered.
Also, these triggers aren't being applied to any attributes included from roles.
Keeping track of changing attributes adds quite a lot of overhead and impacted performance, but we currently have to do it because save()
only saves if the object has changed.
I would like to be able to pass a callback to search methods that will be called once the results object was retrieved and built.
It would probably need some mechanism to pass the async transport class to be used.
In the PAUSE index (02packages.details.txt):
FieldTest undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Binary undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Boolean undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Date undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::GeoPoint undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::IP4 undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Nested undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Number undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::Object undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
FieldTest::String undef D/DR/DRTECH/Elastic-Model-0.28.tar.gz
None of these modules should be indexed. You should either declare the full list of indexable modules in META.json using 'provides', split the package
declarations of these packages up over two lines, or declare the t/
directory off-limits to indexing using 'no_index' in META.json.
When importing attributes from roles, if you don't have access to the role, all attributes will be
indexed using the default settings. Provide a way to override the settings for applied role attributes
This distribution has a link to CPAN Ratings at cpanratings.perl.org, but this address now redirects to https://metacpan.org/
Looks like you need to be more specific about Moose versions in your PREREQs.
We're pre-2.0 still (1.25).
Would be nice to be able to select which fields to retrieve and then be able to use the object without the need of a full 'get'. It can probably work in some way of "read only" mode, just retrieve full object when trying to write on it or more fine grained like when trying to access a not full retrieved field. On my planned usage I will never write back as this is for search results only, so setting it as read only and raise exception when trying to use other way is fine for me.
Another approach can be to let the class define a constructor that will be in charge of parsing partial result... but automatic functioning is much better and probably doable as it's always the same.
Of course, required attributes should have default values if you plan to selectively not retrieve one of those.
One problem I see is when selecting partial data from a hash returned data is in dot notation, so it should be converted back to a hash before inflation on partial objects.
I see this feature very interesting for online search over docs that can have lot of data with the only purpose of searching/filtering.
Elastic::Model::Results objects include coderefs and cached objects which are not cacheable.
Add a to_cache()
method to return just the data required to reinflate a resultset
On my system (perl 5.10) each call to Elastic::Model::UID::new_from_store takes 1.36ms.
That adds up fast.
Much of the cost is in Moose::Meta::TypeConstraint checking.
I see the Elastic::Model::UID doesn't use PACKAGE->meta->make_immutable;
That may help (here and in other packages).
Also, perhaps you'd consider dropping the Str and/or Maybe[Str] 'isa' declarations since they add relatively little value.
The current mapping for UIDs uses a path of just_name
and an index_name
like uid.index
. This means that all Doc class attributes in an object are being indexed as the same fields, so it is impossible to distinguish a search for (mother => $user)
from (father => $user)
.
Add Elastic::Model::Bulk
to save/overwrite documents in batches for improved performance:
$bulk = $model->bulk(
size => 1000,
on_conflict => sub {...},
on_error => sub {...}
);
$bulk->save($doc);
$bulk->overwrite($doc);
...
$bulk->commit;
Reindexing docs may also involve a new routing scheme for the new index (or even changes in type or ID). This should be accounted for when updating the UID references in other indexes. Currently we only update the index name.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.