Git Product home page Git Product logo

Comments (30)

jaymecd avatar jaymecd commented on August 25, 2024

@ramsey gist updated

from uuid.

ramsey avatar ramsey commented on August 25, 2024

Thanks! I've got it loaded up in MacCallGrind now. I'll dig through this and report back what we'll do to proceed.

from uuid.

fabre-thibaud avatar fabre-thibaud commented on August 25, 2024

I left a comment on the gist, I may have an idea of whats going on

from uuid.

fabre-thibaud avatar fabre-thibaud commented on August 25, 2024

After having done some tests, it turns out the root of the issue is that the uuid_create function returns a string UUID which then needs to be parsed into a Uuid instance, and that part is expensive. Not sure how to solve that issue without changing the fundamental premise of the lib -- ie, the Uuid::uuidx() start returning string's instead of Uuid instances.

from uuid.

ramsey avatar ramsey commented on August 25, 2024

Is it 2.5x slower than Rhumsaa\Uuid because it's using uuid_create() and parsing it into a Ramsey\Uuid object?

from uuid.

fabre-thibaud avatar fabre-thibaud commented on August 25, 2024

Exactly.

My base benchmark results are:

melody run test-benchmark.php 
PECL | 0.0410 sec/10000 | 0.0000041 sec/one
RHUMSAA | 0.1500 sec/10000 | 0.0000150 sec/one
RAMSEY | 0.3280 sec/10000 | 0.0000328 sec/one

I tested the following changes:

  • Return the uuid_create result without parsing => ~ 0.06 seconds
  • Remove the PECL factory => ~0.20 seconds (vs. 0.15 seconds on 2.8.x, I assume the overhead is due to the decomposition into various classes)
  • Remove the if branch that checks whether extension is loaded) => ~0.35 seconds

from uuid.

fabre-thibaud avatar fabre-thibaud commented on August 25, 2024

After a bit of pondering, I came to the following conclusion: most use cases for UUID are all about generating a UUID string (ie. database primary keys etc). A potential workaround for the performance issue would be to add a new Uuid class that stores its field data as a string, and only tries to parse it into fields when necessary. That would give a significant performance boost for most scenarios using the PECL lib, since we get to skip all the parsing/formatting code. Thoughts ?

from uuid.

ramsey avatar ramsey commented on August 25, 2024

It sounds like using the PECL library isn't buying us any efficiencies, as I suspected it would. The primary reason to use it was because it uses /dev/urandom, if available.

from uuid.

ramsey avatar ramsey commented on August 25, 2024

@aztech-dev I'm going to think on your suggestion for a bit. It sounds like it could be a good solution to "lazy-parse" the UUIDs, but I want to give it some consideration first.

from uuid.

jaymecd avatar jaymecd commented on August 25, 2024

@aztech-dev provides good point, as mostly it's required to get it as a string, sometimes as bytes (eg. Mongo binary uuid_rfc*). and any complex parsing to make lazy when its really needed.

from uuid.

fabre-thibaud avatar fabre-thibaud commented on August 25, 2024

Sure, let me know when you reach a conclusion if you want me to implement it.

from uuid.

ramsey avatar ramsey commented on August 25, 2024

Sure thing. I like the solution, so be my guest and proceed. :-)

from uuid.

jaymecd avatar jaymecd commented on August 25, 2024

👍

from uuid.

fabre-thibaud avatar fabre-thibaud commented on August 25, 2024

I've issued a P/R for the fix, let me know if anything needs changing/improving.

Performance with the patch is ~0.1s/10K v4 uuid's (vs ~0.35 without) on my machine.

from uuid.

ramsey avatar ramsey commented on August 25, 2024

I know it's been a while since we've revisited this, but now that I've released 3.0.0-alpha1, I want to talk a bit more about the performance drop and what we should about it.

The more I think about it, the more I'm not sure I like the LazyUuid approach, though I don't have a better approach to offer at this time. It just feels awkward to me.

I've taken the benchmark script (thanks so much for it @jaymecd), and have put it up in a separate repo here: https://github.com/ramsey/uuid-benchmark. I've added several more benchmarks to the list, so that we can see how ramsey/uuid performs without pecl-uuid, with @ircmaxell's RandomLib, and with PHP 7's generate_bytes(). (Hint: generate_bytes() wins in ramsey/uuid.)

Here are my results:

            PECL | 0.0280 sec/10000 | 0.0000028 sec/one
         RHUMSAA | 0.0400 sec/10000 | 0.0000040 sec/one
     RAMSEY-PECL | 0.1630 sec/10000 | 0.0000163 sec/one
   RAMSEY-NOPECL | 0.0960 sec/10000 | 0.0000096 sec/one
RAMSEY-RANDOMLIB | 100.5020 sec/10000 | 0.0100502 sec/one
     RAMSEY-PHP7 | 0.1040 sec/10000 | 0.0000104 sec/one

Let's keep talking and see if we can come up with a good solution, if there's a solution that will work for us.

from uuid.

jaymecd avatar jaymecd commented on August 25, 2024

Good job @ramsey. Whats next?
I see lazy-parsing as primary option - as majority don't play with bits of UUID or extract timestamp, unless it's really required.

from uuid.

jaymecd avatar jaymecd commented on August 25, 2024

Pls take a look into ramsey/uuid-benchmark#1 PR.

from uuid.

ircmaxell avatar ircmaxell commented on August 25, 2024

Just FYI, the performance implications of RandomLib are known. They are intentional, as it's a proof-of-work that protects against poor random sources.

Additionally, I see that you're using a low strength generator. I would suggest raising that to a medium strength as it will then be useful in security contexts.

from uuid.

fbidu avatar fbidu commented on August 25, 2024

@ircmaxell I honestly am not so sure that raising the strength of the generator would be very useful because the UUID is, by definition, "practically unique" also I guess that the core of its strength lies on the hashing algorithm...

from uuid.

ircmaxell avatar ircmaxell commented on August 25, 2024

@fbidu it's "practically unique" provided you use a strong source of randomness. If the source of randomness is predictable, so will the generated UUIDs be. There's no magic source of entropy in there (and no, a hash function is not a source of entropy).

from uuid.

fbidu avatar fbidu commented on August 25, 2024

@ircmaxell Well, you're right that we don't have a magic source of entropy but I'm afraid about the performance implications versus how less likely a collision will be...

from uuid.

fbidu avatar fbidu commented on August 25, 2024

@ircmaxell we could easily add a test for collisions @ the benchmark! Though it will probably take a lot of time to find one...

from uuid.

ircmaxell avatar ircmaxell commented on August 25, 2024

we could easily add a test for collisions @ the benchmark!

If you can add a test for collisions, you should submit it as a PHD thesis. Because it'll get accepted on-the-spot. Proving something is or isn't random is hard.

Snark aside, testing randomness is an extremely difficult problem. All of the "low" strength sources have known vulnerabilities against them (rand(), mt_rand(), etc) or offer at best offer few bits of entropy (microtime adds about 4 bits of entropy). It tries to mitigate these issues using a mixing function. But the mixer can't create entropy that wasn't there to begin with.

Medium strength adds /dev/urandom, which is the correct way of generating random numbers to be used in security sensitive code. Period. The mixing algorithms and philosophy that RandomLib uses prevents a compromise of any one system from taking down the set of them. But the quality of the output is only as good as the strongest source (this is provable).

Hence why the recommendation that medium be used for all security context usages. It adds little cost to the generation (it's not going to double runtime), but it greatly improves the provable security of the output.

from uuid.

ramsey avatar ramsey commented on August 25, 2024

Even though we will default to the medium-strength generator, we aren't locking you in. You may always opt to pass in the low-strength generator to the UuidFactory or FeatureSet and use it however you wish.

from uuid.

fbidu avatar fbidu commented on August 25, 2024

@ircmaxell thanks for your answer! It was really thorough.
To be fair I wasn't thinking about a formal proof of (non?) randomness of the current system. Rather I was wondering if we could add a stochastic simulation that would allow for an evaluation of the likelihood of a collision. But as @ramsey pointed out, there will be no hard binding to a specific generator so I'd say that this is not a pressing issue anymore

from uuid.

fabre-thibaud avatar fabre-thibaud commented on August 25, 2024

a95140c should close this right ?

from uuid.

ramsey avatar ramsey commented on August 25, 2024

I think so. I wanted to run some benchmarks again for the https://github.com/ramsey/uuid-benchmark project, and then we can close this out.

from uuid.

ramsey avatar ramsey commented on August 25, 2024

I've updated the thread on the uuid-benchmark project (ramsey/uuid-benchmark#2).

At this point, I think we've done all we can to optimize the ramsey/uuid library while maintaining flexibility.

The benchmarks are still showing a performance drop between rhumsaa/uuid 2.8.2 and ramsey/uuid 3.0.0-alpha3, and without looking at the callgraphs, I think this is due in large part to the greater OOP abstraction in the new version. It's this abstraction that provides a great deal of flexibility to implement many of the features that others have asked for (e.g. COMB, other PRNGs, etc.). In order to provide this level of flexibility, I think it's acceptable to sacrifice some performance.

@jaymecd, since you opened this issue, I'd like to hear your thoughts on the changes made and the performance between versions. Is this performance drop acceptable in light of the more advanced features of the library? If not, how can we improve?

from uuid.

jaymecd avatar jaymecd commented on August 25, 2024

Features and OOP abstraction are great for developers to simplify their life, but here comes the question: which operations would be used at most, let's say in high-load project?
I think for casual use-case it should perform very well and rest features should load on-demand.
In general I like how it is right now. Tnx.

from uuid.

ramsey avatar ramsey commented on August 25, 2024

Performance is definitely something we need to be aware of and keep an eye on when we make changes like this. Thanks for bringing it to my attention. I'll hang on to that uuid-benchmark repo for reference and will strive to improve the performance of this library over time.

Thanks!

from uuid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.