Git Product home page Git Product logo

Comments (8)

msaari avatar msaari commented on August 18, 2024

In general Relevanssi is not recommended for cases where the table has millions of records. That may work, if the hardware is up to it, but I've seen a case where on a shared hosting a site with four million rows in the wp_relevanssi table spent 75 seconds doing a search. Having the indexing run faster does not help there at all. (With good hardware, apparently indexing millions of posts is possible.)

My test site only has 250k rows in the wp_relevanssi table. There the lookup time increases steadily up to a point, then fluctuates without growing much. It doesn't get to problematic levels there. What kind of query times are you seeing? If you log the times in relevanssi_build_index() like this, how does it look like?

	$time_before = microtime();
	$content = $wpdb->get_results( $query ); // phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared
	$time_spent = microtime() - $time_before;
	error_log("Time spent: $time_spent");

I don't mind making the indexing faster, but adding another table is a very big complication in the works, and one I'd like to avoid as far as possible. I would need to see some numbers on how bad the problem is, and how big a difference the change can make – in my case this query takes 1.29 seconds of time, when the whole indexing process takes 309 seconds. This is less than half a percent of the time spent, so it does feel like overoptimizing. But with a big database, the situation may be different. Can you provide some data on this?

from relevanssi.

asakous avatar asakous commented on August 18, 2024

same hardware 400K+ record. I just change left join table and insert new post_id to new table.
it seems double the performance at least.

the result
Time spent:00:00:05.670 indexed:20
Time spent:00:00:09.724 indexed:20
Time spent:00:00:08.804 indexed:20
Time spent:00:00:08.012 indexed:20
Time spent:00:00:06.270 indexed:20
Time spent:00:00:18.588 indexed:20
Time spent:00:00:12.463 indexed:15
Time spent:00:00:26.075 indexed:10
Time spent:00:00:06.694 indexed:5
Time spent:00:00:00.694 indexed:5
Time spent:00:00:02.435 indexed:11
Time spent:00:00:07.737 indexed:16
Time spent:00:00:08.043 indexed:16
Time spent:00:00:05.737 indexed:16
Time spent:00:00:06.892 indexed:16
Time spent:00:00:04.763 indexed:16
Time spent:00:00:05.993 indexed:16
Time spent:00:00:08.580 indexed:16
-------------------------------------------- below new method
Time spent:00:00:01.518 indexed:9
Time spent:00:00:04.584 indexed:19
Time spent:00:00:04.091 indexed:24
Time spent:00:00:05.226 indexed:29
Time spent:00:00:03.573 indexed:29
Time spent:00:00:05.087 indexed:34
Time spent:00:00:04.928 indexed:34
Time spent:00:00:04.644 indexed:34
Time spent:00:00:07.229 indexed:39
Time spent:00:00:06.947 indexed:39
Time spent:00:00:05.692 indexed:39
Time spent:00:00:04.733 indexed:39

from relevanssi.

msaari avatar msaari commented on August 18, 2024

It's faster, yes, but if you compare the total indexing time, how much faster is that? Is there a significant difference in the total indexing time? What exactly did you change here?

from relevanssi.

asakous avatar asakous commented on August 18, 2024

total indexing time.
same hardware
2116391 record in the table and I delete 150654 record.
re-index unindexed post

Time elapsed: 0:05:13 | Time remaining: we're done!-->my join table method
Time elapsed: 0:09:15 | Time remaining: we're done!

my method only affect people who have millions record in the table.
for the small table. don't bother to change it.

from relevanssi.

msaari avatar msaari commented on August 18, 2024

Well, that is a significant difference there. On the other hand, nine minutes is much faster than I expected in the first place.

What exactly did you do here? I'd like to give it a go.

from relevanssi.

asakous avatar asakous commented on August 18, 2024

1:
CREATE TABLE wp_relevanssi_id (
doc int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (doc)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
2:indexing.php
if ( ! empty( $values ) ) {
$values = implode( ', ', $values );
$query = "INSERT IGNORE INTO $relevanssi_table (doc, term, term_reverse, content, title, comment, tag, link, author, category, excerpt, taxonomy, customfield, type, taxonomy_detail, customfield_detail, mysqlcolumn) VALUES $values";
$query2 = "INSERT IGNORE INTO wp_relevanssi_id (doc) VALUES (".$post->ID.") ";-->add
if ( $debug ) {
relevanssi_debug_echo( "Final indexing query:\n\t$query" );
}

	$wpdb->query( $query2 );-->add 
	$wpdb->query( $query ); // phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared
	
}

3:indexing.php
$q = "SELECT post.ID
FROM $wpdb->posts post
LEFT JOIN $wpdb->posts parent ON (post.post_parent=parent.ID)
LEFT JOIN wp_relevanssi_id r ON (post.ID=r.doc)-->change table name
WHERE
$processed_post_filter
AND
(post.post_status IN ($valid_status)
OR
(post.post_status='inherit'
AND(
(parent.ID is not null AND (parent.post_status IN ($valid_status)))
OR (post.post_parent=0)
)
)
)

not implemented
rebuild whole index should truncate table wp_relevanssi_id too.

another possible optimization is that
when people try to rebuild whole index don't truncate table first.
the rebuild flow should be
1:create a identical relevanssi temporary table name relevanssi_temp
2:insert index to relevanssi_temp
3: when build index completed ,
rename relevanssi table to relevanssi_temp_b
rename relevanssi_temp to relevanssi
rename table relevanssi_temp_b to relevanssi_temp
truncate relevanssi_temp
that way to aviod search downtime

from relevanssi.

msaari avatar msaari commented on August 18, 2024

Thanks. I'll try this myself. I was thinking it might be possible to implement this as an add-on plugin that could be used when necessary. That might make sense. I'll have to think about this.

Your optimization is a nice trick, but Relevanssi is already taking up so much database space that doubling that – even momentarily – is probably too much.

from relevanssi.

asakous avatar asakous commented on August 18, 2024

thanks.

from relevanssi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.