Git Product home page Git Product logo

es-wp-query's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

es-wp-query's Issues

get_queried_object is not always working

In bd00339 a code allowing WP_Query::get_posts() repeated call was introduced, but the WP_Query::get_posts() is not strictly idempotent, as there is some backward compatibility related logic and what's more, some query variable evaluation and modification logic which is changing the query_vars property of the class.

Eg.: passing author_name query variable leads to setting author query variable which is then used for determining the get_queried_object. See https://core.trac.wordpress.org/browser/tags/4.7/src/wp-includes/class-wp-query.php#L2121

Example failing code:

$my_query = new WP_Query( array( 'author_name' => 'existing-user-slug', 'es' => true ) );
$my_query->get_queried_object(); // Returns `false`.

vs.

$my_query = new WP_Query( array( 'author_name' => 'existing-user-slug', ) );
$my_query->get_queried_object(); // Returns author object.

Also, calling WP_Query::get_posts() multiple times, or even creating a new WP_Query object while passing query args to the constructor and then calling WP_Query::get_post() leads to performance issues in certain cases caused by unnecessarily complex SQL queries - due to the backward compatibility logic.

Again, an example:

$my_query = new WP_Query(
    array(
        'posts_per_page' => 10
        'cat' => $some_category_term_id, // This is a legacy query arg which gets extended in WP_Query::get_posts() function.
    )
);
$my_query = $latest_posts_query->get_posts(); // This triggers the `WP_Query::get_posts` again, but with moar query variables than just `cat` - those were backfilled during the `$my_query`'s object construction

Once the WP_Query::get_posts() have been called, either directly or through the constructor of the WP_Query when query args were passed to it, the array of posts should be accessed via WP_Query's posts property - as it 1) saves a SQL query 2) returns the expected set of posts

@mboynes I wonder, do you have any valid use case for calling WP_Query::get_posts() twice or is that just an edge case the plugin covers as developers do sometimes do that, not knowing all the consequences?

I'm asking as in my humble opinion a proper fix for non-working get_queried_objct would not be to whitelisting some query variables which are dynamically set, but not resetting those at all. Thoughts?

WP_Query() with taxonomy query affects get_query_var() when es-wp-query is enabled

Description

When querying using WP_Query() with Elasticsupport enabled, es-wp-query can cause WP_Query objects to drop taxonomy queries attached. This can cause get_query_var('taxonomy') to return an empty string.

For instance, when the main loop calls WP_Query() with a custom taxonomy, like this:

WP_Query( array(
   ...,
   'vertical' => 'chairs',
   'es' => true,
));

then subsequent calls to get_query_var( 'taxonomy' ) will return an empty string. This will not happen with no es argument passed to WP_Query(), as then get_query_var( 'taxonomy') will return the correct value.

Cause

This problem likely occurs because in the function WP_Query::get_posts() the term and other keys of WP_Query::$query_vars are added after the pre_get_posts action hook is called here -- this action hook is used to take over any calls to WP_Query() or get_posts() and offload the query to Elasticsearch. The term key (and others) are added in WP_Query::get_posts() around here.

The action hook implemented by class-es-wp-query-shoehorn.php (es_wp_query_shoehorn) will call WP_Query::parse_query() here, and because it does not add back in terms and taxonomy information after re-parsing, the terms and taxonomy information will be left out -- this will ultimately impact the following:

if ( ! empty( $this->tax_query->queried_terms ) ) {

found in class-wp-query.php (here) as with Elasticsearch enabled, it is evaluated as false and evaluated as true otherwise. It is after this point term and other values are added to WP_Query::$query_vars.

Solution

The solution appears to add the taxonomy information back in after calling WP_Query::parse_query(), which will ensure the terms being available by get_query_var().

query_vars_changed & query_vars_hash are private

if ( $hash != $this->query_vars_hash ) {
            $this->query_vars_changed = true;
            $this->query_vars_hash = $hash;
        }

In class-es-wp-query-wrapper.php probably won't work as intended since they are set as private in the WP_Query class. This probably doesn't change anything but maybe just removing that code would be best.

Custom Taxonomy Queries

I am attempting to set up a staging environment that runs ElasticSearch for a site that's hosted on WordPress VIP.

I am having trouble with queries that use custom taxonomies.

If I put in this taxonomy query...

'tax_query' => array(
     array( 'taxonomy' => 'featured_content', 'field' => 'slug', 'terms' => 'ftrd', 'operator' => 'IN' )
}

The call that goes out to the API contains the term_id and returns empty.

{"terms":{"taxonomy.featured_content.term_id":[427]}}

If you test that API call from command line, it gets back 0 results.

$ curl -L https://public-api.wordpress.com/rest/v1/sites/7369149/search -H "Content-Type: application/json" --data '{"filter":{"and":[{"terms":{"taxonomy.featured_content.term_id":[427]}},{"term":{"post_type":"post"}},{"terms":{"post_status":["publish"]}}]},"sort":[{"date":"desc"}],"fields":["post_id"],"from":0,"size":5}'

However, if I change that URL to contain the query I originally entered...

{"terms":{"taxonomy.featured_content.slug":["ftrd"]}}

It returns the expected results

$ curl -L https://public-api.wordpress.com/rest/v1/sites/7369149/search -H "Content-Type: application/json" --data '{"filter":{"and":[{"terms":{"taxonomy.featured_content.slug":["ftrd"]}},{"term":{"post_type":"post"}},{"terms":{"post_status":["publish"]}}]},"sort":[{"date":"desc"}],"fields":["post_id","site_id"],"from":0,"size":5}'

Is it possible for ES WP Query to use the query as originally entered? Or is that an alternative way to get custom taxonomies working in this way?

lazy_load_term_meta query param is always `true`

Due to how WP_Query initializes default query parameters, and ES WP Query's interaction with WP Query, the lazy_load_term_meta param ends up always being set to true when the internal query to fetch posts by id runs.

This ends up triggering get_the_terms() excessively, which leads to high numbers of cache and/or db reads.

The problem can be most readily seen by watching the $q var here:

$q = $query->query = $this->original_query_args;
$query->parse_query();
$q = array_merge( $current_query_vars, $q );

Before $query->parse_query(); is run, lazy_load_term_meta is still false...yet after it has flipped to true.

As a workaround, it can be bypassed with a filter like this:

add_filter( 'pre_get_posts', function( &$query ) {
   $query->set( 'lazy_load_term_meta', false );

   return $query;
}, 9999 );

Uppercase term query confuses ES

When passing a term in a term query in uppercase, this gets passed on as-is to ES which is case-sensitive.

WP_Query:

array (
  'no_found_rows' => true,
  'post_status' => 
  array (
    0 => 'publish',
  ),
  'posts_per_page' => 20,
  'tax_query' => 
  array (
    0 => 
    array (
      'taxonomy' => 'capi_content_type',
      'field' => 'name',
      'terms' => 
      array (
        0 => 'NEWS_STORY',
      ),
    ),
  ),
  'ignore_sticky_posts' => true,
)

resulting ES query:

{
    "from": 0,
    "size": 20,
    "fields": [
        "post_id"
    ],
    "query": {
        "filtered": {
            "query": {
                "match_all": {}
            },
            "filter": {
                "bool": {
                    "must": [
                        {
                            "terms": {
                                "taxonomy.capi_content_type.name.raw": [
                                    "NEWS_STORY"
                                ]
                            }
                        },
                        {
                            "terms": {
                                "post_type": [
                                    "post",
                                    "page",
                                    "attachment"
                                ]
                            }
                        },
                        {
                            "terms": {
                                "post_status": [
                                    "publish"
                                ]
                            }
                        }
                    ]
                }
            }
        }
    },
    "sort": [
        {
            "date": "desc"
        }
    ],
    "timeout": "2s"
}

As ES doesn't find anything matching "NEWS_STORY" the query returns no results, even though the WP_Query returns plenty.

Release multiple versions to support the whims of ES features

Elasticsearch has a fairly rapid release cycle, and it has no dedication to backwards compatibility. Furthermore, there's no great way to see what version of ES we're querying against (without making an additional HTTP request, and that's assuming the server supports that endpoint).

As such, we're going to split ES_WP_Query and support multiple versions at once.

See #35, #36.

Error Warning: Parameter 2 to ES_WP_Query_Shoehorn

Have got WP 4.8, on PHP 7.1.6 with searchpress.
On the site's static frontpage, I've set es => true on all the WP_Query there.
On categories, date and authors ..I set es via pre_get_posts

Everything looks to be working as expected but this error warning appears for every query:
Parameter 2 to ES_WP_Query_Shoehorn::filter__posts_request() expected to be a reference, value given

is_main_query() returns false

I'm not sure if this falls under a bug or a feature, but if you do $query->is_main_query() inside the
do_action_ref_array( 'pre_get_posts', array( &$this ) ); hook this will return false for the "shadow" ES query of the main query. While I understand that it's technically true that this is not the real main WP_Query object I feel like it should return true.

Thoughts?

Date queries are incorrectly formatted

When using post_date_gmt in WP_Query the resultant ES query is invalid and fails to return any results.

E.g. a WP_Query that does:

'date_query' => 
    array (
      0 => 
      array (
        'column' => 'post_date_gmt',
        'after' => '2015-12-04 21:10:57',
      ),
    ),

results in this ES query:

"range": {
    "wp_31106563_posts.post_date_gmt": {
        "gt": "2015-12-04 21:10:57"
    }
}

The ES query returns zero results, while the WP_Query returns the results we'd expect to see.

Set max from/size for ES

In ES 2.x there is now a maximum value for from+size in order to prevent deep paging which can cause performance problems.

We've also added this same limit when querying WP.com 1.x indices (after seeing some performance problems). It seems that es-wp-query should handle this case cleanly. For example, what happens when a bot tries to walk all archive pages.

Cannot view trashed posts in wp-admin

We've discovered an issue whereby, when es-wp-query is active for all wp-admin queries, viewing trashed posts is not possible.

This is due to the way in which es-wp-query-shoehorn.php overrides the post status and post type to any when converting the Elasticsearch query into a WordPress query, and the way WordPress subsequently treats any as only public posts.

The final query that is constructed is one that includes posts with status trash, but excludes posts with status trash.

I have a potential fix for this which we're currently testing and will be adding a PR shortly.

Allow `ES_WP_Query::query_es()` to return a WP_Error

If ES_WP_Query::query_es() (in an adapter) encounters an error, it should be able to communicate that to the wrapper. As of now, doing so wouldn't cause any issues due to proper return value validation, but it would be good to explicitly catch errors in the wrapper.

Known issues

Things that don't quite work out of the box

  • Sorting by RAND()
    • You can make this work with a custom script if you so choose
  • Sorting by post__in, post_parent__in
    • You can probably make this work with a custom script if you so choose
  • Sort by meta_value
    • Need to think through this one a little more
  • Query by week, w, dayofyear, dayofweek
    • This is in the works, but depends heavily on how the data is indexed. There needs to be a way to enable/disable this.
  • Can't meta compare 'REGEXP', 'NOT REGEXP', 'RLIKE'
    • This could probably be done, but I have to think through the implications first.

Things that don't work at all

  • Meta value casting
    • There's no equivalent in ES, I just don't think this is something that will be possible in any capacity.

Noteworthy

  • Some tests from core were failing because they were written to assume that two posts with the same date, when ordered by date, would show up in the order in which they were added to the database. However, in ES, they aren't guaranteed to show in that order. tl;dr: unspecified post orders aren't the same between MySQL and ES and can lead to unexpected results.

Empty author archives return a 404 instead of author.php

WordPress core uses get_query_var( 'author' ) in handle_404() of WP to detect if a request is a valid author archive request.

WordPress core fetches the author query var value during get_posts() in WP_Query which doesn't happen in ES_WP_Query_Wrapper or even later during reboot_query_vars() in ES_WP_Query_Shoehorn.

This leads to get_query_var( 'author' ) returning null in handle_404() and author archives being returned as 404 if no posts are found.

Ref: VIP ZD66699

Posts_per_page may not be working properly

It seems that when using ES, $wp_query->query_vars['posts_per_page'](inc/post-types/class-post-type-slideshow.php line 142) isn't returning the correct posts_per_page used. It's instead using the site default of 10 (get_option( 'posts_per_page' ));

This is happening at es-wp-query/class-es-wp-query-shoehorn.php in reboot_query_vars() (line 219 currently).

Manually setting $query->query['posts_per_page'] will stop es-wp-query from overwriting the posts_per_page, and this should hopefully be a usable fix until we can look more into why the plugin isn't using the correct number.

Add 'es' query var to WP_Query

If 'es' is true, that query would be upgraded to be an ES_WP_Query. Then developers can still instantiate WP_Query, and if they ever deactivated the plugin, everything would continue to work as expected.

Furthermore, it would open up other opportunities, like allowing one to run most of their site with ES, if they really wanted.

Travis builds are fragile

Sometimes Travis fails because ES times out (most of the issues, from my anecdotal data) or something else happens during indexing. We need to harden the ES processing, if only to repeat processes before failing.

Terms are passed as JS object, not an array

When we do a term query, passing a list of term IDs, ES expects this to be an array. However, we're passing it as an object and so ES fails to understand the query and return correct results.

Example WP_Query:

'queried_terms' => 
    array (
      'category' => 
      array (
        'terms' => 
        array (
          0 => 1482347944,
          1 => 1482349929,
          2 => 1591026488,
          3 => 1586722036,
          4 => 1590667016,
        ),
        'field' => 'term_id',
      ),
    )

and the ES query looks something like:

                        {
                            "terms": {
                                "category.term_id": {
                                    "0": 1482347944,
                                    "1": 1482349929,
                                    "2": 1591026488,
                                    "3": 1586722036,
                                    "4": 1590667016
                                }
                            }
                        }

There is an existing PR to address this but I have an upcoming PR which will address this and some other issues.

Search query var gets re-slashed in rebooted query

WP_Query calls stripslashes() on the s query var when parsing searches: https://github.com/WordPress/wordpress-develop/blob/eb06a59f536692e9499b89abd3f156ff56d3075e/src/wp-includes/class-wp-query.php#L1338-L1341

ES_WP_Query mimics this behavior in \ES_WP_Query_Wrapper::parse_search(), but \ES_WP_Query_Shoehorn::reboot_query_vars() puts the original s back into the query, causing the slashed string to be returned in get_search_query() and functions relying on it like wp_get_document_title().

Add a check to see if the current query can/should be run using ES

ES_WP_Query should have a check in place to see if the request can actually be executed successfully, and if not, defer to WP_Query.

For instance, if the ES index doesn't contain post_mime_type data, and the query includes it, that query should be executed against MySQL. Another example is a post__in query with no other arguments; ES would be of no benefit there.

meta_key and meta_query

When both parameters are present only meta_key is used, the meta_query part is silently dropped.
The current WP_Query behaviour is to have an exist check on the meta_key that is inside the meta_key property as well as what is defined into meta_query.

Let me know if you think this should be supported and I'll write a patch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.