Git Product home page Git Product logo

Comments (21)

dvirsky avatar dvirsky commented on May 22, 2024 1

Implemented!

FT.SUGDEL key str

Enjoy!

from redisearch.

mannol avatar mannol commented on May 22, 2024 1

Wow, that was unexpectedly fast! Thanks!

from redisearch.

mannol avatar mannol commented on May 22, 2024 1

Regarding your second comment: I would like that, yes, as I'm developing a search functionality for a rather large scale application and I'd like to keep everything in redis. Of course I don't want to pressure you to do anything as this is an open-source project, though your effort is greatly appreciated by me and I believe by many others that will come to use RediSearch. Thank you for your time. :)

P. S. How long would it take you to implement that?

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024 1

It should take a couple of hours, it's just a matter of finding the time. I'll try to do it this week.

It will make the deletions slower, but should keep the searches just as fast.

from redisearch.

mannol avatar mannol commented on May 22, 2024 1

It works! Thanks man, you just made my week!

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

Hey. Adding something simple such as marking an entry as deleted until the tree is rebuilt is trivial, I can do it soon. As long as you're not deleting too much it should work fine.

A complete deletion with rebalancing the tree might be a bit trickier. I'll see what I can do.

from redisearch.

mannol avatar mannol commented on May 22, 2024

Nice m8! Thanks.

from redisearch.

mannol avatar mannol commented on May 22, 2024

Just tested this and it creates a huge performance decrease once keys are removed.

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

What did you test? How many keys and how many did you delete?
It shouldn't slow things down at all, but it won't make things faster either.

from redisearch.

mannol avatar mannol commented on May 22, 2024

Oh, should've said that sooner...
Anyways:
I've testedit by first adding a million of random entries then, adding a million of similar entries like "asdfg(+ an integer from 1-1000000)" then deleting those entries right after adding them all. After that searching for "asdfg" takes forever.

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

Is it any different from not deleting anything? It shouldn't be. Can you test that?

If you save the database and reload, you'll get only the un-deleted tree.

BTW the test case is not realistic - you're doing a prefix search that traverses the entire data set with no shortcuts, it's no different than doing KEYS * in redis.

from redisearch.

mannol avatar mannol commented on May 22, 2024

I'll share some code I based this benchmark on in a minute.

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

I can offer a data set of the entire English wikipedia entries and popularity scores, I used that for developing the module and benchmarking.

Regarding the deletion - I can implement "real" deletion and make it way faster, but I can't whip it out in an hour like this shortcut :)

from redisearch.

mannol avatar mannol commented on May 22, 2024

Ok, so I based my benchmark on the following:

void fill_first()
{
    redisContext* c = redisConnect("127.0.0.1", 34567);

    for (int i = 0; i < 1000000; i++)
    {
        int n = (core::random::uint8_get() % 24) + 1; // some random size
        char tag[n], name[n];

        const char alphanum[] = { "abcdefghijklmnopqrstuvwxyz" };

        uint8_t rand_idx[n * 2];
        core::random::get(rand_idx, n * 2); // Get random bytes

        for (int i = 0; i < n; i ++)
            tag[i] = alphanum[rand_idx[i] % sizeof(alphanum)]; // fill the array with alphanum chars

        for (int i = n; i < n * 2; i ++)
            name[i - n] = alphanum[rand_idx[i] % sizeof(alphanum)]; // fill the array with alphanum chars

        redisCommand(c, "FT.SUGADD userslex %b:%b %d", tag, n, name, n, i);
    }

    redisFree(c);
}

void add_delete(const char* variant)
{
    redisContext* c = redisConnect("127.0.0.1", 34567);

    for (int i = 0; i < 1000000; i++)
        redisCommand(c, "FT.SUGADD userslex %s%d %d", variant, i, i);

    for (int i = 0; i < 1000000; i++)
        redisCommand(c, "FT.SUGDEL userslex %s%d", variant, i);

    redisFree(c);
}

void search(const char* str)
{
    redisContext* c = redisConnect("127.0.0.1", 34567);
    redisCommand(c, "FT.SUGGET userslex %b MAX 10 FUZZY", str, strlen(str));
    redisFree(c);
}

int main (int argc, char** argv)
{
    fill_first(); // fill the completer with 1000000 random alphanumeric entries

    search("asdfg"); // ended in 2 ms

    add_delete("asdfg"); // now add 1000000 entries of a variant string and remove those entries

    search("asdfg"); // ended in 66 ms, i.e. 33 times slower on the same key-set
    search("unrel"); // ended in 0 ms, i.e. the search on unrelated entries (the entries that do not contain the characters used in a variant above) is the same
}

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

Ok, there might be a simpler solution than to implement full deletion.

from redisearch.

mannol avatar mannol commented on May 22, 2024

Thanks man, I appreciate it!

from redisearch.

mstaack avatar mstaack commented on May 22, 2024

awesome!

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

@mannol ok, looks like I fixed it, even though I doubt it was a real problem (I can go into greater detail on why this is very specific to the kind of data you generated).

After the fix, memory is freed, and after the first search iteration, searches are just as fast. This is what I'm getting from the benchmark, running the search 10 times before the add/delete and 10 times after:

Before:

done search in 1.225009ms!
done search in 0.723388ms!
done search in 0.798917ms!
done search in 0.774701ms!
done search in 0.768723ms!
done search in 0.827574ms!
done search in 0.806563ms!
done search in 0.664389ms!
done search in 0.649305ms!
done search in 0.597494ms!

--- Adding/Deleting!--- 

done POST DEL search in 11.494974ms!
done POST DEL search in 0.889031ms!
done POST DEL search in 0.825044ms!
done POST DEL search in 0.808792ms!
done POST DEL search in 0.789969ms!
done POST DEL search in 0.786663ms!
done POST DEL search in 0.791179ms!
done POST DEL search in 0.772230ms!
done POST DEL search in 0.737571ms!
done POST DEL search in 1.238047ms!

(these results are from 0.5M records, but it doesn't matter)

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

BTW thanks for providing the benchmark, it saved me tons of work figuring this shit out.

from redisearch.

dvirsky avatar dvirsky commented on May 22, 2024

BTW notice that if you're doing FUZZY prefix searches, you should limit the prefix to 3 characters minimum. IIRC the module doesn't do that on its own.

from redisearch.

mannol avatar mannol commented on May 22, 2024

Yeah, I've noticed that during benchmarking. FUZZY searching with less than 3 characters is useless for our service anyway.

from redisearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.