Git Product home page Git Product logo

Comments (7)

vstinner avatar vstinner commented on August 26, 2024

On micro-benchmarks (values less than 100 ns), each process has different performance depending on many things: environment variables, current working directory, adress space layout (which is randomized on Linux: ASLR), Python random hash seed (indirectly change the number of collisions in hash tables), etc.

pyperf is the result of my research of benchmarking: https://vstinner.github.io/category/benchmark.html

pyperf is not turned for JIT compilers. I tried but failed to implement R changepoint in pyperf to decide when a benchmark looks "steady". I stopped my research at: https://vstinner.readthedocs.io/pypy_warmups.html

Sadly, it seems like nobody tried to tune pyperformance for PyPy so far. PyPy still uses its own benchmark suite and its own benchmark runner.

If you want to change the default parameters, can you please prove that it has a limited or no impact on reproducible results? My main concern is getting reproducible results, not really to run benchmarks fast. But I'm also annoyed that a whole run of pyperformance is so slow. Reproducible means that for example that if you run a benchmark 5 times on the same machine but reboot the machine between each run, you get almost the same values (mean +- std dev).

For that, I like to use "pyperf dump" and "pyperf stats", to look at all values, not just the mean and std dev.

On the other side, I'm perfectly fine to have different parameters for JIT compilers. pyperf already has heuristics only enabled if a JIT compiler is detected. Currently, it's mostly about computing the number of warmups in the first (and maybe second) worker process.

from pyperf.

vstinner avatar vstinner commented on August 26, 2024

Ah, also, I don't want to be the gatekeeper of pyperf, I want it to be useful to most people :-) That's why I added co-maintainers to the project: @corona10 and @pablogsal who also care about Python performance.

from pyperf.

vstinner avatar vstinner commented on August 26, 2024

I'd like to propose a new set of numbers, such as 3 worker processes for 4s each.

On CPython with CPU isolation, in my experience, 3 values per process (ignoring the first warmup) are almost the same. Computing more values per process wouldn't bring much benefits.

If you don't use CPU isolation, it can be different. With a JIT compiler, it's likely very different. Also, Python 3.10 optimizes LOAD_ATTR if you run a code object often enough. Python 3.11 optimizes way more opcodes with a new "adaptative" bytecode design. So last years, CPython performance also started to change depending on how many times you run a benchmark. It may also need more warmups ;-)

from pyperf.

corona10 avatar corona10 commented on August 26, 2024

@vstinner @kmod
I have a neutral stand on this proposal. But as @vstinner commented, pyperf should not be tuned for the specific implementation.
I fully understand that pyston projects want to show the best performance but look like pyperf project does not consider pyston project situation.

IMO, users should know that JIT implementation needs warmup time, it should be also measurable and seeable to end-users through a benchmark. so I would like to suggest the following things.

  • Providing parameters options to consider their own warm-up time.
  • pyperf measures the benchmark into two sections: before-warm-up / after-warm-up and also the pyperf should show how much warmup time was needed as the result.
  • If the implementation does not support JIT, both sections would be measured in the same execution engine (CPython might be this case now)
  • By doing this we can measure both before-warm-up performance and after-warm-up performance and also how many warmup time is needed for the specific implementation.
  • I don't have ideas yet that what metric should be used for the before-warm-up period because the period is unstable. Maybe @markshannon has ideas for measuring the before-warm-up period.

WDYT?

from pyperf.

vstinner avatar vstinner commented on August 26, 2024

Very important paper in this field: https://arxiv.org/abs/1602.00602 "Virtual Machine Warmup Blows Hot and Cold" (2017).

from pyperf.

markshannon avatar markshannon commented on August 26, 2024

The number of times we run a benchmark and the duration of run should be independent.
The reason for running a benchmark multiple times is to get stable results. How long the individual runs are shouldn't matter, as long as the results are stable enough.

We do want some form of inter-process warmup (compiling pyc files, warming O/S file caches, etc) as that reduces noise, but allowing some VMs a free "warmup" time is nonsense.

We can have benchmarks of varying lengths.
For example, three different web-server benchmarks: one that serves 10k requests, one that serves 100k request, and one that serves 1M requests (or something like that).
Stretching out the runtime of benchmark by looping or discounting "warmup" is deliberately misleading, IMO.

I agree with @kmod that many (all?) of the pyperformance benchmarks do not reflect user experience.
The solution is to have better benchmarks, not to fudge the results.

If a JIT compiler has a long warmup, but is fast in the long run, we should show that, not just say it is fast.

from pyperf.

kmod avatar kmod commented on August 26, 2024

@markshannon So to be clear, pyperf already treats jit-implementations different than non-jit ones, and I am advocating for getting rid of this distinction. I think a single set of numbers should be chosen, and personally I think the jit numbers (or higher) should be chosen, but I think choosing the non-jit numbers for everyone would also be an improvement.

Also I could have been more clear -- my proposal doesn't change the number of samples collected or the length of each sample, just the number of processes that those samples are spread across. Also for what it's worth pyperf already gives each process a short warmup period.

@vstinner I disagree that reproducibility is the primary concern of benchmarking, because if true then "return 0" would be an ideal benchmarking methodology. The current interest in benchmarking is coming from wanting to explain to users how their experience might be changed by switching to a newer python implementation; I don't think users really care if the number is "15% +- 1%" vs "15% +- 0.1%", but they would care if the real number is actually "25% +- 1%" because the benchmarking methodology was not representative of their workload. ie I think accuracy is generally more important than precision, and that's the tradeoff that I'm advocating for here. I could see the argument "python processes run on average for 600ms so that's why we should keep that number" but personally I believe that that premise is false.

Maybe put another way: I think everything that's been said on this thread would also be an argument against me proposing increasing the runtime to 600ms if it were currently 300ms. So this thread seems to be implying that we should actually decrease the amount of time per subprocess? For what it's worth, I believe pyperf's per-process execution time is a few orders of magnitude smaller than what everyone else does, which is suggestive of increasing it.

from pyperf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.