alleyinteractive / es-wp-query Goto Github PK
View Code? Open in Web Editor NEWA drop-in replacement for WP_Query to leverage Elasticsearch for complex queries
A drop-in replacement for WP_Query to leverage Elasticsearch for complex queries
In bd00339 a code allowing WP_Query::get_posts()
repeated call was introduced, but the WP_Query::get_posts()
is not strictly idempotent, as there is some backward compatibility related logic and what's more, some query variable evaluation and modification logic which is changing the query_vars property of the class.
Eg.: passing author_name
query variable leads to setting author
query variable which is then used for determining the get_queried_object
. See https://core.trac.wordpress.org/browser/tags/4.7/src/wp-includes/class-wp-query.php#L2121
Example failing code:
$my_query = new WP_Query( array( 'author_name' => 'existing-user-slug', 'es' => true ) );
$my_query->get_queried_object(); // Returns `false`.
vs.
$my_query = new WP_Query( array( 'author_name' => 'existing-user-slug', ) );
$my_query->get_queried_object(); // Returns author object.
Also, calling WP_Query::get_posts()
multiple times, or even creating a new WP_Query
object while passing query args to the constructor and then calling WP_Query::get_post()
leads to performance issues in certain cases caused by unnecessarily complex SQL queries - due to the backward compatibility logic.
Again, an example:
$my_query = new WP_Query(
array(
'posts_per_page' => 10
'cat' => $some_category_term_id, // This is a legacy query arg which gets extended in WP_Query::get_posts() function.
)
);
$my_query = $latest_posts_query->get_posts(); // This triggers the `WP_Query::get_posts` again, but with moar query variables than just `cat` - those were backfilled during the `$my_query`'s object construction
Once the WP_Query::get_posts()
have been called, either directly or through the constructor of the WP_Query
when query args were passed to it, the array of posts should be accessed via WP_Query
's posts
property - as it 1) saves a SQL query 2) returns the expected set of posts
@mboynes I wonder, do you have any valid use case for calling WP_Query::get_posts()
twice or is that just an edge case the plugin covers as developers do sometimes do that, not knowing all the consequences?
I'm asking as in my humble opinion a proper fix for non-working get_queried_objct
would not be to whitelisting some query variables which are dynamically set, but not resetting those at all. Thoughts?
When querying using WP_Query()
with Elasticsupport enabled, es-wp-query
can cause WP_Query
objects to drop taxonomy queries attached. This can cause get_query_var('taxonomy')
to return an empty string.
For instance, when the main loop calls WP_Query()
with a custom taxonomy, like this:
WP_Query( array(
...,
'vertical' => 'chairs',
'es' => true,
));
then subsequent calls to get_query_var( 'taxonomy' )
will return an empty string. This will not happen with no es
argument passed to WP_Query()
, as then get_query_var( 'taxonomy')
will return the correct value.
This problem likely occurs because in the function WP_Query::get_posts()
the term and other keys of WP_Query::$query_vars
are added after the pre_get_posts
action hook is called here -- this action hook is used to take over any calls to WP_Query()
or get_posts()
and offload the query to Elasticsearch. The term key (and others) are added in WP_Query::get_posts() around here.
The action hook implemented by class-es-wp-query-shoehorn.php
(es_wp_query_shoehorn
) will call WP_Query::parse_query()
here, and because it does not add back in terms and taxonomy information after re-parsing, the terms and taxonomy information will be left out -- this will ultimately impact the following:
if ( ! empty( $this->tax_query->queried_terms ) ) {
found in class-wp-query.php
(here) as with Elasticsearch enabled, it is evaluated as false
and evaluated as true
otherwise. It is after this point term and other values are added to WP_Query::$query_vars
.
The solution appears to add the taxonomy information back in after calling WP_Query::parse_query()
, which will ensure the terms being available by get_query_var()
.
if ( $hash != $this->query_vars_hash ) {
$this->query_vars_changed = true;
$this->query_vars_hash = $hash;
}
In class-es-wp-query-wrapper.php probably won't work as intended since they are set as private in the WP_Query class. This probably doesn't change anything but maybe just removing that code would be best.
I am attempting to set up a staging environment that runs ElasticSearch for a site that's hosted on WordPress VIP.
I am having trouble with queries that use custom taxonomies.
If I put in this taxonomy query...
'tax_query' => array(
array( 'taxonomy' => 'featured_content', 'field' => 'slug', 'terms' => 'ftrd', 'operator' => 'IN' )
}
The call that goes out to the API contains the term_id and returns empty.
{"terms":{"taxonomy.featured_content.term_id":[427]}}
If you test that API call from command line, it gets back 0 results.
$ curl -L https://public-api.wordpress.com/rest/v1/sites/7369149/search -H "Content-Type: application/json" --data '{"filter":{"and":[{"terms":{"taxonomy.featured_content.term_id":[427]}},{"term":{"post_type":"post"}},{"terms":{"post_status":["publish"]}}]},"sort":[{"date":"desc"}],"fields":["post_id"],"from":0,"size":5}'
However, if I change that URL to contain the query I originally entered...
{"terms":{"taxonomy.featured_content.slug":["ftrd"]}}
It returns the expected results
$ curl -L https://public-api.wordpress.com/rest/v1/sites/7369149/search -H "Content-Type: application/json" --data '{"filter":{"and":[{"terms":{"taxonomy.featured_content.slug":["ftrd"]}},{"term":{"post_type":"post"}},{"terms":{"post_status":["publish"]}}]},"sort":[{"date":"desc"}],"fields":["post_id","site_id"],"from":0,"size":5}'
Is it possible for ES WP Query to use the query as originally entered? Or is that an alternative way to get custom taxonomies working in this way?
Due to how WP_Query initializes default query parameters, and ES WP Query's interaction with WP Query, the lazy_load_term_meta
param ends up always being set to true
when the internal query to fetch posts by id runs.
This ends up triggering get_the_terms()
excessively, which leads to high numbers of cache and/or db reads.
The problem can be most readily seen by watching the $q
var here:
es-wp-query/class-es-wp-query-shoehorn.php
Lines 218 to 220 in 5b6d5d0
Before $query->parse_query();
is run, lazy_load_term_meta
is still false
...yet after it has flipped to true
.
As a workaround, it can be bypassed with a filter like this:
add_filter( 'pre_get_posts', function( &$query ) {
$query->set( 'lazy_load_term_meta', false );
return $query;
}, 9999 );
https://github.com/alleyinteractive/es-wp-query/blob/master/class-es-wp-query-wrapper.php#L530 expects === false but gets empty array value in https://github.com/alleyinteractive/es-wp-query/blob/master/class-es-wp-tax-query.php#L95
When passing a term in a term query in uppercase, this gets passed on as-is to ES which is case-sensitive.
WP_Query:
array (
'no_found_rows' => true,
'post_status' =>
array (
0 => 'publish',
),
'posts_per_page' => 20,
'tax_query' =>
array (
0 =>
array (
'taxonomy' => 'capi_content_type',
'field' => 'name',
'terms' =>
array (
0 => 'NEWS_STORY',
),
),
),
'ignore_sticky_posts' => true,
)
resulting ES query:
{
"from": 0,
"size": 20,
"fields": [
"post_id"
],
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"taxonomy.capi_content_type.name.raw": [
"NEWS_STORY"
]
}
},
{
"terms": {
"post_type": [
"post",
"page",
"attachment"
]
}
},
{
"terms": {
"post_status": [
"publish"
]
}
}
]
}
}
}
},
"sort": [
{
"date": "desc"
}
],
"timeout": "2s"
}
As ES doesn't find anything matching "NEWS_STORY" the query returns no results, even though the WP_Query returns plenty.
4.1 changes how some queries are represented internally...specifically the structure of WP_Meta_Query::queries
has changed, which affects ES_WP_Meta_Query
.
https://core.trac.wordpress.org/changeset/29887
Compatibility with other query classes (WP_Date_Query
, WP_Tax_Query
, WP_Query
) also needs to be verified:
Elasticsearch has a fairly rapid release cycle, and it has no dedication to backwards compatibility. Furthermore, there's no great way to see what version of ES we're querying against (without making an additional HTTP request, and that's assuming the server supports that endpoint).
As such, we're going to split ES_WP_Query and support multiple versions at once.
Have got WP 4.8, on PHP 7.1.6 with searchpress.
On the site's static frontpage, I've set es => true on all the WP_Query there.
On categories, date and authors ..I set es via pre_get_posts
Everything looks to be working as expected but this error warning appears for every query:
Parameter 2 to ES_WP_Query_Shoehorn::filter__posts_request() expected to be a reference, value given
I'm not sure if this falls under a bug or a feature, but if you do $query->is_main_query()
inside the
do_action_ref_array( 'pre_get_posts', array( &$this ) );
hook this will return false for the "shadow" ES query of the main query. While I understand that it's technically true that this is not the real main WP_Query object I feel like it should return true.
Thoughts?
When using post_date_gmt
in WP_Query the resultant ES query is invalid and fails to return any results.
E.g. a WP_Query that does:
'date_query' =>
array (
0 =>
array (
'column' => 'post_date_gmt',
'after' => '2015-12-04 21:10:57',
),
),
results in this ES query:
"range": {
"wp_31106563_posts.post_date_gmt": {
"gt": "2015-12-04 21:10:57"
}
}
The ES query returns zero results, while the WP_Query returns the results we'd expect to see.
In ES 2.x there is now a maximum value for from+size in order to prevent deep paging which can cause performance problems.
We've also added this same limit when querying WP.com 1.x indices (after seeing some performance problems). It seems that es-wp-query should handle this case cleanly. For example, what happens when a bot tries to walk all archive pages.
es_api_search_index()
takes a second parameter for a stat name, to track Elasticsearch usage.
We've discovered an issue whereby, when es-wp-query
is active for all wp-admin queries, viewing trashed posts is not possible.
This is due to the way in which es-wp-query-shoehorn.php
overrides the post status and post type to any
when converting the Elasticsearch query into a WordPress query, and the way WordPress subsequently treats any
as only public posts.
The final query that is constructed is one that includes posts with status trash
, but excludes posts with status trash
.
I have a potential fix for this which we're currently testing and will be adding a PR shortly.
Do the same thing I did here: alleyinteractive/searchpress#145
Related to #63
When a user opens a pull request, Github Actions test for PHPCS and unit test.
If ES_WP_Query::query_es()
(in an adapter) encounters an error, it should be able to communicate that to the wrapper. As of now, doing so wouldn't cause any issues due to proper return value validation, but it would be good to explicitly catch errors in the wrapper.
WordPress core uses get_query_var( 'author' )
in handle_404()
of WP
to detect if a request is a valid author archive request.
WordPress core fetches the author
query var value during get_posts()
in WP_Query
which doesn't happen in ES_WP_Query_Wrapper
or even later during reboot_query_vars()
in ES_WP_Query_Shoehorn
.
This leads to get_query_var( 'author' )
returning null
in handle_404()
and author archives being returned as 404 if no posts are found.
Ref: VIP ZD66699
It seems that when using ES, $wp_query->query_vars['posts_per_page'](inc/post-types/class-post-type-slideshow.php line 142) isn't returning the correct posts_per_page used. It's instead using the site default of 10 (get_option( 'posts_per_page' ));
This is happening at es-wp-query/class-es-wp-query-shoehorn.php in reboot_query_vars() (line 219 currently).
Manually setting $query->query['posts_per_page'] will stop es-wp-query from overwriting the posts_per_page, and this should hopefully be a usable fix until we can look more into why the plugin isn't using the correct number.
If 'es'
is true
, that query would be upgraded to be an ES_WP_Query
. Then developers can still instantiate WP_Query
, and if they ever deactivated the plugin, everything would continue to work as expected.
Furthermore, it would open up other opportunities, like allowing one to run most of their site with ES, if they really wanted.
Sometimes Travis fails because ES times out (most of the issues, from my anecdotal data) or something else happens during indexing. We need to harden the ES processing, if only to repeat processes before failing.
When we do a term query, passing a list of term IDs, ES expects this to be an array. However, we're passing it as an object and so ES fails to understand the query and return correct results.
Example WP_Query:
'queried_terms' =>
array (
'category' =>
array (
'terms' =>
array (
0 => 1482347944,
1 => 1482349929,
2 => 1591026488,
3 => 1586722036,
4 => 1590667016,
),
'field' => 'term_id',
),
)
and the ES query looks something like:
{
"terms": {
"category.term_id": {
"0": 1482347944,
"1": 1482349929,
"2": 1591026488,
"3": 1586722036,
"4": 1590667016
}
}
}
There is an existing PR to address this but I have an upcoming PR which will address this and some other issues.
WP_Query
calls stripslashes()
on the s
query var when parsing searches: https://github.com/WordPress/wordpress-develop/blob/eb06a59f536692e9499b89abd3f156ff56d3075e/src/wp-includes/class-wp-query.php#L1338-L1341
ES_WP_Query
mimics this behavior in \ES_WP_Query_Wrapper::parse_search()
, but \ES_WP_Query_Shoehorn::reboot_query_vars()
puts the original s
back into the query, causing the slashed string to be returned in get_search_query()
and functions relying on it like wp_get_document_title()
.
ES_WP_Query
should have a check in place to see if the request can actually be executed successfully, and if not, defer to WP_Query.
For instance, if the ES index doesn't contain post_mime_type
data, and the query includes it, that query should be executed against MySQL. Another example is a post__in
query with no other arguments; ES would be of no benefit there.
WordPress core supports search queries such as ?s=searchstring&author=authorname
but with this plugin it returns an empty set of results whenever author
is added.
When both parameters are present only meta_key is used, the meta_query part is silently dropped.
The current WP_Query behaviour is to have an exist check on the meta_key that is inside the meta_key property as well as what is defined into meta_query.
Let me know if you think this should be supported and I'll write a patch.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.