Git Product home page Git Product logo

relevanssi's People

Contributors

asakous avatar faeddur avatar figureone avatar georgejipa avatar herndlm avatar iwillhappy1314 avatar jacobdb avatar janw-me avatar msaari avatar pandelisz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

relevanssi's Issues

another index performance issue related to #25

assume relevanssi table has millions record
in order to retrieve un-index post id . the query (see below) need to be execute each loop and it will perform slower and slower .
my solution is create another table that only record indexed post id (need modify code)and create a pk key for the column.
after that change left join relevanssi table to what I just created.
theory is quite simple. instead of join a big table why not just join small one.

$q = "SELECT post.ID
FROM $wpdb->posts post
LEFT JOIN $wpdb->posts parent ON (post.post_parent=parent.ID)
LEFT JOIN $relevanssi_table r ON (post.ID=r.doc)

Duplicate Searches logged if using inline 'background-image:url();'

It seems that if you have an inline background-image rendered to your template, (with no image present), then searches get logged twice.

Steps to reproduce:

  1. Clean install of Wordpress
  2. Install Relevanssi
  3. Check 'Keep a log of user queries'
  4. Use 2017 theme.
  5. In template-parts/post/content-excerpt.php, add the line <div style="background-image:url();"></div> so it will appear on the rendered page
  6. Search for 'world'
  7. The 'search count' will increase by two.

Fun Fact:

  1. Edit the line of HTML that you added to have a image: <div style="background-image:url(http:image.com);"></div>
  2. Search for 'world'
  3. Search count only increases by one.

This seems like it might even be a WordPress problem, but I wanted to make a mention of it here first as it is counting the logs in a Relevanssi created table.

redundant index docs

since primary key already cover up doc column, threre is no need to create a independent for doc column.
for example
explain select * from wp_relevanssi where doc=12344
result
1 | SIMPLE | wp_relevanssi | ref | PRIMARY,docs | PRIMARY | 4 | const | 1 |  

mysql use key PRIMARY, not key doc

"Index unindexed posts" fails due to a 403 unauthorized response from admin-ajax.php

Hello @msaari,

I am unable to "Index Unindexed Posts" in Relevanssi version 4.0.7, WordPress version 4.9.4.

When I visit the Indexing admin tab, and click the "Index Unindexed Posts" button, with Chrome's Inspector open, I can see the initial request sent to admin-ajax as "action=relevanssi_index_posts&completed=0&total=1686&offset=0&limit=10&extend=true". The response from admin-ajax is then a "403 Unauthorized" response.

I tried doing a hard reload (multiple times), to ensure that I do not have a cached version of any admin JS scripts, and yet the issue persists.

In PhpStorm, I did a little tracing, and it appears that a check_ajax_referer check fails in lib/admin-ajax.php, in the relevanssi_index_posts_ajax_wrapper() function, around line 35: check_ajax_referer( 'relevanssi_indexing_nonce', 'security' );

I inspected the $_REQUEST object from inside relevanssi_index_posts_ajax_wrapper() and it was shaped as follows:

$_REQUEST = Array ( [action] => relevanssi_index_posts [completed] => 0 [total] => 1686 [offset] => 0 [limit] => 10 [extend] => true )

I do not see the "security" key (the second argument to check_ajax_referer) in the $_REQUEST object, which could explain why check_ajax_referer fails.

If I comment out the nonce check, then "Index Unindexed Posts" appears to behave as expected. Obviously commenting out the nonce check is not a good solution, so I dug a little deeper into where the AJAX request is made.

I ran console.log from multiple places in lib/admin_scripts.js, and I do not see the "security" key as anything but undefined.

When args are constructed for the first call to process_indexing_step at lib/admin_scripts.js, line 163, there is no indication that the nonce is being added with the "security" key to the request payload.

If I add 'security' : nonce.indexing_nonce to the args array, at around line 170, then "Index Unindexed Posts" appears to behave as expected.

I am not entirely certain if this is directly related to other recent threads that similarly describe indexing failures on the WPORG forums, but it may be possible. I am also not certain if there are other AJAX calls in the plugin that may be affected by this. It does appear that this solves the issue for me.

PHP8.1 and relevanssi_meta_query_from_query_vars()

With PHP8.1 I get an error in relevanssi_meta_query_from_query_vars()

On line 1226 in lib/search.php you set $meta_query to boolean false, instead of an empty array;
I have queries that passes through without populating $meta_query at all, thus generating the following error:

[12-Sep-2023 12:58:11 UTC] PHP Fatal error: Uncaught ErrorException: Automatic conversion of false to array is deprecated in /<path occluded>/plugins/relevanssi-premium/lib/search.php:1280

https://github.com/msaari/relevanssi/blob/8edf8621ba51aa5ac807f33201050f8c5a19c3ae/lib/search.php#L1226C22-L1226C22

function relevanssi_total_queries performance problem

Since I have two million records on wp_relevanssi_log table.
I found that relevanssi_total_queries function will perform very badly on each query.
my solution is that
with add a index on time column . these query could change from
[SELECT COUNT(id) FROM $log_table WHERE TIMESTAMPDIFF(DAY, time, NOW()) <= 1;]
to
[SELECT COUNT(id) FROM $log_table WHERE TIMESTAMPDIFF(DAY, time, NOW()) <= 1 and and time >= date_sub(now() , interval 2 day );]

analyze table suggestion

when relevanssi table has millions recode. analyze table will become slower and slower . especially on rebuild whole posts.
maybe just move the command right before complete ?

   // To prevent empty indices.
$wpdb->query( "ANALYZE TABLE $relevanssi_table" ); // phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared,WordPress.DB.PreparedSQL.InterpolatedNotPrepared

$complete = false;
$size     = $indexing_query_args['size'];

if ( ( 0 === $size ) || ( count( $content ) < $size ) ) {
	$complete = true;
	$wpdb->query( "ANALYZE TABLE $relevanssi_table" );--> move to here 
	update_option( 'relevanssi_indexed', 'done', false );

	// Update the document count variable.
	relevanssi_async_update_doc_count();
}

Relevanssi breaks the Media Library grid search

If Relevanssi is enabled in the admin searches, searching for media in the Media Library grid view is broken. While I'm figuring out what's up with this, there are two solutions to this problem:

  1. Use the list view instead of the grid view.
  2. Disable Relevanssi in admin searches.

Oxygen revisions are indexed

If custom fields are set to index "visible" or "all", this will include ct_builder_shortcodes_revisions, which will mean old content will be indexed and used in excerpts. Relevanssi needs to make sure this shortcode is always excluded.

Feature Request: Increase compatibility with Elementor

Hi there,

We are currently testing your Plugin's free version and I have to say - We are impressed!
We consider buying the premium lifetime license but there is one requirement missing in the current state of the plugin - compatibility with the Elementor Pro Posts Widget.

More specifically the custom excerpt feature is not compatible with the excerpt displayed in the Posts widget. If I understand correctly you are storing the excerpt in the main queries WP_Query object in the excerpt / post_excerpt attribute which should be output upon invoking the_excerpt. Elementor does it the recommended way though still it does not work.

Maybe I am missing something and I am unsure if this is the right place for such an issue as it may possibly be caused by Elementor (I have opened an issue on the Elementor Github as well https://github.com/orgs/elementor/discussions/27316).

If any further information is required on the topic I am happy to oblige so feel free to discuss this matter at any time.

Thanks for your time.

Premium 2.14.5: empty search queries are a DoS risk?

We had a few incidents over the last week where empty search queries started tying up the database, with Relevanssi search query only one visible in MariaDB 10.3 process list.

I am also able to reproduce a very ineffective DB query manually: query like https://<site>/?s with no parameter value takes 5 seconds to complete.

There seems to be a "Redirect" feature in recent Relevanssi (Premium?), for "empty search terms", but based on the time it takes to redirect, it also performs this slow query first, then redirects.

image

PS we only display 10 results at a time, with numbered paging - should I configured Relevanssi search throttle to 10? It seems to make no sense to return 500 rows, to discard 490 every time 🤔

Using search result highlighting breaks <code>-blocks

Hi there - first of all, thanks for an immensely helpful plug-in. Love it.

That said, I have a small bug to report. Not sure if this really is Relevanssi's doing, but I think so.

Problem
Whenever "highlight in documents" is enabled, any code blocks on the page break (

 is not parsed properly).

With this configuration:
image

A page with "highlight" in URL will look like this:
image

Whenever either highlighting is disabled or the URL parameter removed, it works properly:
image

Any thoughts?

Relevanssi Premium, relevanssi_do_query returns stdClasses

Come across this bug today,

If you use a WP_Query in combination with the relevanssi_do_query function the resulting items will be a object of the type stdClass instead of the type WP_post.

This bug only happens in the premium version, the free version behaves like expected.

Premium version: 2.23.0
Free version: 4.20.0
PHP: 8.0

Example code

 $args = [
            's' => $query,
            'post_type' => isset($_GET['view']) && $_GET['view'] === 'product' ? 'product' : 'any',
            'post_status' => 'publish',
            'posts_per_page' => -1,
            'orderby' => 'date',
            'order' => 'DESC',
        ];

        $query = new \WP_Query();
        $query->parse_query($args);

        $items = relevanssi_do_query($query);

Expected: $items should be an array of WP_Post objects (if there are results)
Result: $items is an array of stdClass

relevanssi_render_block not called

Hello,

I'm trying to exclude some Gutenberg blocks from Relevanssi indexing by adding a filter with the following code :

add_filter('relevanssi_render_block', 'exclude_block_from_search_render');

function exclude_block_from_search_render($block) {
	error_log('TEST RENDER_BLOCK');
}

The problem is that this function is never called during searches. I've also tried the "relevanssi_hits_filter" filter, which is called, but that's not the one I need.

Is this a premium feature? If not, how can I correct this error?

Thanks for your help

Bug: Highlighting causing broken HTML

First of all, thanks for the great plugin!

We found a bug, which is causing the HTML to be invalid when highlighting - data-attributes are catched incorrectly by the regular expression.

We've got these "Search hit highlighting" settings:

  • Highlight type: CSS class
  • CSS class for highlights: search-results__term-hint
  • Highlight in titles: true
  • Highlight in documents: true
  • Highlight in comments: false
  • Expand highlights: false

In our theme we tend to use data-attributes without values to create bindings for JavaScript, like so:

<a
	data-tile
	data-tile-type="file"
>
	<p data-tile-file-heading>...</p>
	<p data-tile-file-filesize-and-ext>pdf</p>
</a>

Given the settings, if you go to the page with that HTML and add the highlight URL param at the end (?highlight=string), this HTML will be converted to this form:

<a
	data-tile
	data-tile-type="file&quot;
&gt;
	&lt;p data-tile-file-heading&gt;&hellip;&lt;/p&gt;
	&lt;p data-tile-file-filesize-and-ext&gt;pdf&lt;/p&gt;
&lt;/a&gt;

Which is of course invalid.

This is a problem with not so perfect regular expression found here:

if ( preg_match_all( '/data-.+?="(.*?)"/sm', $content, $matches ) ) {

This regular expression does better at job:

if ( preg_match_all( '/data-[\w-]+?="([^"]*?)"/sm', $content, $matches ) ) {}

Bug: Pagination prev/next mixed up

Hi,

I've noticed a minor bug when search results return more than one page. On the bottom of the page there is only a "previous" link. When clicking the link it takes me to page 2. When on page 2 I can click on "next" which takes me back to page 1.

Also btw. is there an infinite scrolling option? Could not find one in settings.

v3.6

@msaari can you tag the v3.6 release? I think it caused an issue and I would like to compare to see what happened.

Search logs adds

When a synonym is found it is included in the search log.
reproduce:

  1. Create synonym: round = circle.
  2. Enable logging
  3. Do a search for "round"
  4. Export csv log (or check DB)
  5. See "round circle" in the log.

I have found where to fix this, but I can see several approaches and there are probably more.

a) Don't log the synonym word
b) Do log the synonym but mark it in some way: round (synonym:circle)
c) Add a new column to the log table and log it seperatly.
d) ...?

I'm willing to work on this, but I'd like to discuss the best way to do this before starting.

Tokenizer killed all the search terms

Hello,

I ran into the following issue: When no search term the Tokenizer kills the search even when tax_query is provided.

Use case, search for specific terms from additional search fields to filter results based on taxonomy.

Seems like this line 387: /lib/search.php should test $search_ok before killing. Then on line 831: if ( $exact_match_bonus ) should change to if ( $exact_match_bonus && !empty($q) ) because $q is empty.

Thanks!

How to get in touch regarding a security concern

Hey there!

I belong to an open source security research community, and a member (@geeknik) has found an issue, but doesn’t know the best way to disclose it.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

Homepage displaying unwanted search results

If I type in the homepage address plus “/?posts_per_page=83” I am getting results and it is saving it in google search. If I turn off “Relevanssi” it does not display these results. I want it to NOT display results and also ignore any kind of indexing by google if I type in “myhomepageurl/?posts_per_page=83”. How do I disable this feature but keep the plugin at the same time?
Thanks

Undefined variable error on admin pages

Notice: Undefined variable: content in /wp-content/plugins/relevanssi/lib/privacy.php on line 46

appears on admin pages if logging is disabled. There's a variable definition missing if logging is disabled.

Issue when Relevanssi is used with Oxygen Builder AND ACF

It seems there may be a slight bug in Relevanssi since the incorporation of Oxygen Builder. When using ACF and Oxygen Builder, the custom fields for ACF are not indexed. You also are unable to change the custom fields drop-down in Relevansi to anything other than some (which looks to pre-populate the field with "ct_builder_shortcodes".

When you disable oxygen builder, you're able to change the fields ^^ and Relevanssi will index ACF fields, once you turn oxygen builder back on, the same actions happen (but if you don't rebuild your index it works.

Any chance to rectify this bug so that ACF and Oxygen builder can function and be indexed when both are in use?

YITH Badge Management problem with Relevanssi active

With Relevanssi enabled, YITH Badge Management plugin do not return badge results. Yith support team advertise me that with Relevanssi disabled all works fine and that I should to contact you to resolve the problem (I'm uncertain if the problem is from YITH plugin or Relevanssi).

Improve compatibility with a strict Content Security Policy

The Relevanssi (Premium) plugin includes inline scripts/styles that can cause issues with setting a Content Security Policy (CSP).

This issue proposes:

  • Moving inline scripts/styles to a JS/CSS file if possible
  • Or use wp_print_inline_script_tag() or wp_add_inline_style()

See:
https://make.wordpress.org/core/2021/02/23/introducing-script-attributes-related-functions-in-wordpress-5-7/

Examples:
inline <script> tag
Inline style tag in /premium/templates/relevanssi-related.php

relevanssi_post_date_throttle_join replaces Join instead of adding something

I probably found a bug in search.php:1980

It seems that the relevanssi_post_date_throttle_join function should add something to the join statement but instead replaces the join statement.

If I replace the assignment with an concatenating assignment operator our error is gone:

function relevanssi_post_date_throttle_join( $query_join ) {
	if ( 'post_date' === get_option( 'relevanssi_default_orderby' ) &&
		'on' === get_option( 'relevanssi_throttle', 'on' ) ) {
		global $wpdb;
		$query_join .= ', ' . $wpdb->posts . ' AS p';
	}
	return $query_join;
}

is_numeric validation not leading to expected behaviour

Recent changes in commit 76b87b7
potentially generates faulty SQL queries. Check changes in line 324 to 327.

I've discovered that if a previously deleted term contains a number in its slug, then the validation in line 321 still passes and the string slug is used as a numeric id, leading to a SQL statement where the term is parsed as a non-existing column.

What if is_numeric is to be replaced with ctype_digit instead? This should correctly determine if the slug only contains numeric characters.

Exact match boost not working with capital accented letters

Hi,

I've recently had an issue with the exact match boost setting not working properly in some cases, and after digging in the code I believe I found the issue.

Basically, to apply the exact match boost the search query is matched against the post title using stristr() :

relevanssi/lib/search.php

Lines 1433 to 1438 in 58391b5

if ( ! is_wp_error( $post ) && stristr( $post->post_title, $clean_query ) !== false ) {
$weight *= $exact_match_boost['title'];
}
if ( ! is_wp_error( $post ) && stristr( $post->post_content, $clean_query ) !== false ) {
$weight *= $exact_match_boost['content'];
}

However when using capital accented letters, they are not properly converted into lowercase by that function. For example, this will return false : stristr( 'CAFÉ', 'café' ).

I believe using mb_stristr() instead of stristr() would fix this.

$progress not defined when used in wp-cli

I tried to wrap the cron indexing function in a small wp cli function and running into a php notice.

The $progress is not defined if not in pro version but stull checked in normal version?

Request: move doc count to scheduled task

Hello!

Before I say anything else, I want to thank you for publishing such an useful and well-documented plugin. I use Relevanssi free and premium for a number of clients. Thank you.

I've read your comments in a number of places stating that at a certain point Relevanssi should not handle very large data sets. I absolutely agree, but I am always interested in finding ways to increase that limit, if possible.

One of the biggest bottlenecks I've found is the database query: SELECT COUNT(DISTINCT(doc)). (code reference)

I understand this is important to calculate weights and relevance, but it seems like this can be moderately accurate and still achieve similar results. Is that true?

If so, could this count be deferred to a scheduled task that updates the option? This would increase performance pretty significantly for all uses, but especially for sites with large indexes.

Certain complex tax_query structures don't work

This kind of tax_query does not work:

$args = array(
            'tax_query' => array(
                'relation' => 'OR',
                array(
                    'taxonomy' => 'category',
                    'field'    => 'term_id',
                    'terms'    => array( 3, 36 ),
                    'operator' => 'AND',
                ),
                array(
                    'taxonomy' => 'category',
                    'field'    => 'term_id',
                    'terms'    => array( 30, 36 ),
                    'operator' => 'AND',
                ),
            ),
            's' => 'terms',
        );

Ie. two AND queries joined together with an OR. The OR is ignored, and this becomes a query for posts that have taxonomies 3, 36, 30 and 36. The relevanssi_process_term_tax_ids() doesn't handle this correctly, even though it gets reasonable data.

Raised at https://wordpress.org/support/topic/tax_query-relation-not-work/

PHP Warning: preg_match(): Null byte in regex

The use of stripslashes causes NULL bytes to appear in text. When this text is fed into preg_match it causes warnings. Example:

Mar 14 16:59:10 ip-10-36-94-105 apache2[6306]: PHP Warning:  preg_match(): Null byte in regex in /.../wp-content/plugins/relevanssi-premium/lib/excerpts-highlights.php on line 525
Mar 14 16:59:10 ip-10-36-94-105 apache2[6306]: PHP Warning:  preg_match(): Null byte in regex in /.../wp-content/plugins/relevanssi-premium/lib/excerpts-highlights.php on line 529

I wasn't able to reproduce the warnings, but I was able to show that null bytes get added (in both relevanssi and relevanssi-premium). Steps to reproduce:

  • Activate relevanssi (or relevanssi-premium)
  • Edit lib/common.php: Add echo json_encode($string); after this line: function relevanssi_tokenize( $string, $remove_stops = true, $min_word_length = -1 ) {
  • Search for hello\\0world: i.e. http://localhost/?s=hello%5C%5C0world
  • "hello\\0world""hello\u0000world" will appear on the page. \u0000 is the JSON-encoded NULL byte. In some circumstances this string is passed to preg_match, which causes warnings.

This happens because the stripslashes function not only strips slashes, it also replaces \0 with a NULL byte. This feature is undocumented, you can see it here in PHP's source: https://github.com/php/php-src/blob/php-7.2.3/ext/standard/string.c#L3616-L3651

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.