Comments (8)
In general Relevanssi is not recommended for cases where the table has millions of records. That may work, if the hardware is up to it, but I've seen a case where on a shared hosting a site with four million rows in the wp_relevanssi
table spent 75 seconds doing a search. Having the indexing run faster does not help there at all. (With good hardware, apparently indexing millions of posts is possible.)
My test site only has 250k rows in the wp_relevanssi
table. There the lookup time increases steadily up to a point, then fluctuates without growing much. It doesn't get to problematic levels there. What kind of query times are you seeing? If you log the times in relevanssi_build_index()
like this, how does it look like?
$time_before = microtime();
$content = $wpdb->get_results( $query ); // phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared
$time_spent = microtime() - $time_before;
error_log("Time spent: $time_spent");
I don't mind making the indexing faster, but adding another table is a very big complication in the works, and one I'd like to avoid as far as possible. I would need to see some numbers on how bad the problem is, and how big a difference the change can make – in my case this query takes 1.29 seconds of time, when the whole indexing process takes 309 seconds. This is less than half a percent of the time spent, so it does feel like overoptimizing. But with a big database, the situation may be different. Can you provide some data on this?
from relevanssi.
same hardware 400K+ record. I just change left join table and insert new post_id to new table.
it seems double the performance at least.
the result
Time spent:00:00:05.670 indexed:20
Time spent:00:00:09.724 indexed:20
Time spent:00:00:08.804 indexed:20
Time spent:00:00:08.012 indexed:20
Time spent:00:00:06.270 indexed:20
Time spent:00:00:18.588 indexed:20
Time spent:00:00:12.463 indexed:15
Time spent:00:00:26.075 indexed:10
Time spent:00:00:06.694 indexed:5
Time spent:00:00:00.694 indexed:5
Time spent:00:00:02.435 indexed:11
Time spent:00:00:07.737 indexed:16
Time spent:00:00:08.043 indexed:16
Time spent:00:00:05.737 indexed:16
Time spent:00:00:06.892 indexed:16
Time spent:00:00:04.763 indexed:16
Time spent:00:00:05.993 indexed:16
Time spent:00:00:08.580 indexed:16
-------------------------------------------- below new method
Time spent:00:00:01.518 indexed:9
Time spent:00:00:04.584 indexed:19
Time spent:00:00:04.091 indexed:24
Time spent:00:00:05.226 indexed:29
Time spent:00:00:03.573 indexed:29
Time spent:00:00:05.087 indexed:34
Time spent:00:00:04.928 indexed:34
Time spent:00:00:04.644 indexed:34
Time spent:00:00:07.229 indexed:39
Time spent:00:00:06.947 indexed:39
Time spent:00:00:05.692 indexed:39
Time spent:00:00:04.733 indexed:39
from relevanssi.
It's faster, yes, but if you compare the total indexing time, how much faster is that? Is there a significant difference in the total indexing time? What exactly did you change here?
from relevanssi.
total indexing time.
same hardware
2116391 record in the table and I delete 150654 record.
re-index unindexed post
Time elapsed: 0:05:13 | Time remaining: we're done!-->my join table method
Time elapsed: 0:09:15 | Time remaining: we're done!
my method only affect people who have millions record in the table.
for the small table. don't bother to change it.
from relevanssi.
Well, that is a significant difference there. On the other hand, nine minutes is much faster than I expected in the first place.
What exactly did you do here? I'd like to give it a go.
from relevanssi.
1:
CREATE TABLE wp_relevanssi_id
(
doc
int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (doc
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
2:indexing.php
if ( ! empty( $values ) ) {
$values = implode( ', ', $values );
$query = "INSERT IGNORE INTO $relevanssi_table (doc, term, term_reverse, content, title, comment, tag, link, author, category, excerpt, taxonomy, customfield, type, taxonomy_detail, customfield_detail, mysqlcolumn) VALUES $values";
$query2 = "INSERT IGNORE INTO wp_relevanssi_id (doc) VALUES (".$post->ID.") ";-->add
if ( $debug ) {
relevanssi_debug_echo( "Final indexing query:\n\t$query" );
}
$wpdb->query( $query2 );-->add
$wpdb->query( $query ); // phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared
}
3:indexing.php
$q = "SELECT post.ID
FROM $wpdb->posts post
LEFT JOIN $wpdb->posts parent ON (post.post_parent=parent.ID)
LEFT JOIN wp_relevanssi_id r ON (post.ID=r.doc)-->change table name
WHERE
$processed_post_filter
AND
(post.post_status IN ($valid_status)
OR
(post.post_status='inherit'
AND(
(parent.ID is not null AND (parent.post_status IN ($valid_status)))
OR (post.post_parent=0)
)
)
)
not implemented
rebuild whole index should truncate table wp_relevanssi_id too.
another possible optimization is that
when people try to rebuild whole index don't truncate table first.
the rebuild flow should be
1:create a identical relevanssi temporary table name relevanssi_temp
2:insert index to relevanssi_temp
3: when build index completed ,
rename relevanssi table to relevanssi_temp_b
rename relevanssi_temp to relevanssi
rename table relevanssi_temp_b to relevanssi_temp
truncate relevanssi_temp
that way to aviod search downtime
from relevanssi.
Thanks. I'll try this myself. I was thinking it might be possible to implement this as an add-on plugin that could be used when necessary. That might make sense. I'll have to think about this.
Your optimization is a nice trick, but Relevanssi is already taking up so much database space that doubling that – even momentarily – is probably too much.
from relevanssi.
thanks.
from relevanssi.
Related Issues (20)
- Relevanssi breaks the Media Library grid search
- Premium 2.14.5: empty search queries are a DoS risk? HOT 5
- $progress not defined when used in wp-cli HOT 1
- Certain complex tax_query structures don't work
- Oxygen revisions are indexed
- function relevanssi_total_queries performance problem HOT 2
- Using search result highlighting breaks <code>-blocks HOT 6
- How to get in touch regarding a security concern HOT 1
- 4.16.0 showing as available in wpackagist but no release/tag exists HOT 4
- YITH Badge Management problem with Relevanssi active HOT 4
- Bug: Highlighting causing broken HTML HOT 2
- Relevanssi Premium, relevanssi_do_query returns stdClasses HOT 3
- relevanssi_render_block not called HOT 2
- Exact match boost not working with capital accented letters HOT 2
- PHP8.1 and relevanssi_meta_query_from_query_vars() HOT 5
- relevanssi_post_date_throttle_join replaces Join instead of adding something HOT 2
- Homepage displaying unwanted search results HOT 2
- Improve compatibility with a strict Content Security Policy HOT 3
- Feature Request: Increase compatibility with Elementor HOT 6
- Bug: Pagination prev/next mixed up HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from relevanssi.