It would be nice to be able to erase entries from suggestion tree. Is such a feature p

Implemented! <div class="snippet-clipboard-content notranslate position-relative o

FEATURE REQUEST: Erasing entries from suggestions about redisearch HOT 21 CLOSED

redisearch commented on May 22, 2024

FEATURE REQUEST: Erasing entries from suggestions

from redisearch.

Comments (21)

dvirsky commented on May 22, 2024 1

Implemented!

FT.SUGDEL key str

Enjoy!

from redisearch.

mannol commented on May 22, 2024 1

Wow, that was unexpectedly fast! Thanks!

from redisearch.

mannol commented on May 22, 2024 1

Regarding your second comment: I would like that, yes, as I'm developing a search functionality for a rather large scale application and I'd like to keep everything in redis. Of course I don't want to pressure you to do anything as this is an open-source project, though your effort is greatly appreciated by me and I believe by many others that will come to use RediSearch. Thank you for your time. :)

P. S. How long would it take you to implement that?

from redisearch.

dvirsky commented on May 22, 2024 1

It should take a couple of hours, it's just a matter of finding the time. I'll try to do it this week.

It will make the deletions slower, but should keep the searches just as fast.

from redisearch.

mannol commented on May 22, 2024 1

It works! Thanks man, you just made my week!

from redisearch.

dvirsky commented on May 22, 2024

Hey. Adding something simple such as marking an entry as deleted until the tree is rebuilt is trivial, I can do it soon. As long as you're not deleting too much it should work fine.

A complete deletion with rebalancing the tree might be a bit trickier. I'll see what I can do.

from redisearch.

mannol commented on May 22, 2024

Nice m8! Thanks.

from redisearch.

mannol commented on May 22, 2024

Just tested this and it creates a huge performance decrease once keys are removed.

from redisearch.

dvirsky commented on May 22, 2024

What did you test? How many keys and how many did you delete?
It shouldn't slow things down at all, but it won't make things faster either.

from redisearch.

mannol commented on May 22, 2024

Oh, should've said that sooner...
Anyways:
I've testedit by first adding a million of random entries then, adding a million of similar entries like "asdfg(+ an integer from 1-1000000)" then deleting those entries right after adding them all. After that searching for "asdfg" takes forever.

from redisearch.

dvirsky commented on May 22, 2024

Is it any different from not deleting anything? It shouldn't be. Can you test that?

If you save the database and reload, you'll get only the un-deleted tree.

BTW the test case is not realistic - you're doing a prefix search that traverses the entire data set with no shortcuts, it's no different than doing KEYS * in redis.

from redisearch.

mannol commented on May 22, 2024

I'll share some code I based this benchmark on in a minute.

from redisearch.

dvirsky commented on May 22, 2024

I can offer a data set of the entire English wikipedia entries and popularity scores, I used that for developing the module and benchmarking.

Regarding the deletion - I can implement "real" deletion and make it way faster, but I can't whip it out in an hour like this shortcut :)

from redisearch.

mannol commented on May 22, 2024

Ok, so I based my benchmark on the following:

void fill_first()
{
    redisContext* c = redisConnect("127.0.0.1", 34567);

    for (int i = 0; i < 1000000; i++)
    {
        int n = (core::random::uint8_get() % 24) + 1; // some random size
        char tag[n], name[n];

        const char alphanum[] = { "abcdefghijklmnopqrstuvwxyz" };

        uint8_t rand_idx[n * 2];
        core::random::get(rand_idx, n * 2); // Get random bytes

        for (int i = 0; i < n; i ++)
            tag[i] = alphanum[rand_idx[i] % sizeof(alphanum)]; // fill the array with alphanum chars

        for (int i = n; i < n * 2; i ++)
            name[i - n] = alphanum[rand_idx[i] % sizeof(alphanum)]; // fill the array with alphanum chars

        redisCommand(c, "FT.SUGADD userslex %b:%b %d", tag, n, name, n, i);
    }

    redisFree(c);
}

void add_delete(const char* variant)
{
    redisContext* c = redisConnect("127.0.0.1", 34567);

    for (int i = 0; i < 1000000; i++)
        redisCommand(c, "FT.SUGADD userslex %s%d %d", variant, i, i);

    for (int i = 0; i < 1000000; i++)
        redisCommand(c, "FT.SUGDEL userslex %s%d", variant, i);

    redisFree(c);
}

void search(const char* str)
{
    redisContext* c = redisConnect("127.0.0.1", 34567);
    redisCommand(c, "FT.SUGGET userslex %b MAX 10 FUZZY", str, strlen(str));
    redisFree(c);
}

int main (int argc, char** argv)
{
    fill_first(); // fill the completer with 1000000 random alphanumeric entries

    search("asdfg"); // ended in 2 ms

    add_delete("asdfg"); // now add 1000000 entries of a variant string and remove those entries

    search("asdfg"); // ended in 66 ms, i.e. 33 times slower on the same key-set
    search("unrel"); // ended in 0 ms, i.e. the search on unrelated entries (the entries that do not contain the characters used in a variant above) is the same
}

from redisearch.

dvirsky commented on May 22, 2024

Ok, there might be a simpler solution than to implement full deletion.

from redisearch.

mannol commented on May 22, 2024

Thanks man, I appreciate it!

from redisearch.

mstaack commented on May 22, 2024

awesome!

from redisearch.

dvirsky commented on May 22, 2024

@mannol ok, looks like I fixed it, even though I doubt it was a real problem (I can go into greater detail on why this is very specific to the kind of data you generated).

After the fix, memory is freed, and after the first search iteration, searches are just as fast. This is what I'm getting from the benchmark, running the search 10 times before the add/delete and 10 times after:

Before:

done search in 1.225009ms!
done search in 0.723388ms!
done search in 0.798917ms!
done search in 0.774701ms!
done search in 0.768723ms!
done search in 0.827574ms!
done search in 0.806563ms!
done search in 0.664389ms!
done search in 0.649305ms!
done search in 0.597494ms!

--- Adding/Deleting!--- 

done POST DEL search in 11.494974ms!
done POST DEL search in 0.889031ms!
done POST DEL search in 0.825044ms!
done POST DEL search in 0.808792ms!
done POST DEL search in 0.789969ms!
done POST DEL search in 0.786663ms!
done POST DEL search in 0.791179ms!
done POST DEL search in 0.772230ms!
done POST DEL search in 0.737571ms!
done POST DEL search in 1.238047ms!

(these results are from 0.5M records, but it doesn't matter)

from redisearch.

dvirsky commented on May 22, 2024

BTW thanks for providing the benchmark, it saved me tons of work figuring this shit out.

from redisearch.

dvirsky commented on May 22, 2024

BTW notice that if you're doing FUZZY prefix searches, you should limit the prefix to 3 characters minimum. IIRC the module doesn't do that on its own.

from redisearch.

mannol commented on May 22, 2024

Yeah, I've noticed that during benchmarking. FUZZY searching with less than 3 characters is useless for our service anyway.

from redisearch.

FEATURE REQUEST: Erasing entries from suggestions about redisearch HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent