Git Product home page Git Product logo

profiling's Introduction

open-quantum-safe

profiling

Testing various functional and non-functional properties like performance and memory consumption

Purpose

This repository is to contain software geared to collect profiling information across the algorithms supported by liboqs at different levels of the software and network stack.

Particularly, measurements will be collected using

  1. ´liboqs´ library-level performance testing using ´speed_sig´ and ´speed_kem´ for execution performance numbers and ´test_sig_mem´ and ´test_kem_mem´ for memory consumption numbers (heap and stack)
  2. ´openssl´ application-level performance testing using ´openssl speed´
  3. ´openssl´ "basic network"-level raw handshake performance testing using ´openssl s_time´
  4. "Simulated"/controlled network-level performance testing [not yet implemented]
  5. "Full stack" performance testing using standard client software like ´curl´ and standard server software like ´nginx´ [not yet implemented].

This repository will not contain tests replicating raw algorithm-level testing as done by Supercop.

Methodology

All tests

  • are packaged into standalone Docker images facilitating execution across different (cloud) platforms and hardware architectures & CPU optimizations
  • are designed to return JSON output representing current profiling numbers that can be stored arbitrarily; initial storage facilities are provided to deposit data into AWS S3.
  • allow to also collect/document profiling numbers of classic crypto to permit comparison with PQC algorithms
  • can be visualized by suitable Chart.js code: see visualization folder.

Wrapper scripts are created to facilitate automatically running these tests on different cloud infrastructures and storing the resulting JSON output as well as the wrapping HTML and JavaScript code.

profiling's People

Contributors

baentsch avatar dstebila avatar martyrshot avatar swilson4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

profiling's Issues

Add CPU frequency to aarch64 test run results

Unlike on x64 VMs, CPU frequency is not output by cat /proc/cpuinfo on AWS aarch64 VMs. This issue is to find a way how this information can be obtained and added to the profiling run results.

Error with command "-curves"

What I tried:

docker run -v ~/resHandshake/:/opt/test/results oqs-perf python3 /opt/test/handshakes.py

What I expected:

That the command runs without errors

What I got:

Generating a oqs_sig_default private key
writing new private key to 'CA.key'

Generating a oqs_sig_default private key
writing new private key to '/opt/test/server.key'

Signature ok
subject=CN = localhost
Getting CA Private Key
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Error with command: "-curves "
Generating a p256_oqs_sig_default private key
writing new private key to 'CA.key'

Why is it throwing this error? What is "-curves" needed for?

Investigate changes

As per April 3, the new "portability" build flags became active in the profiling docker image and per April 8 an update to this image did (incorporating open-quantum-safe/liboqs#957). Both events are visible in the snapshots below, but performance in general never quite went back to previous levels (all x86_64): Would it be reasonable to investigated/improve this? Are build flags set incorrectly for creating the profiling Docker image? Is something else going wrong in building things? Is there a more general issue? Were things incorrect before?

SIKE&SIDH (performance) lost drastically with the last update:
grafik

Kyber (plain, non AES-version) distributable lost somewhat with the first change, but never recovered:
grafik

Frodo distributable, also lost:
grafik

HQC (performance and distributable) in turn benefited from both changes:
grafik

Dilithium AES benefitted while plain Dilithium dropped (performance and distributable)
grafik

Tagging @jschanck for thoughts

Rework visualisation logic

  • Remove code that's basically copied -> more parameterization
  • Add support for selecting/differentiation between (many) different architectures
  • Merge JSON files for different test types/configurations and running on different architectures into single JSON files

Automated deviation notification

Check results of most recent runs with previous runs & generate notifications (email?) in case of significant (10%?) deviations.
Shall avoid things like this to go unnoticed (BIKE keygen performance):
grafik

Automate M1 profiling image build & use

Due to the unavailability (at time of creating the tests) of M1 VMs in AWS and unlike the timed AWS-based runs for x64 and aarch64, the profiling run on m1 is currently triggered by a cron job on a dedicated laptop like this:

docker run --privileged openquantumsafe/oqs-perf bash -c "cd /opt/test && ./run-tests-m1.sh" > /Users/baentsch/profiling/docker-run-m1.log 2>&1

In particular, this means the dedicated Dockerfile-m1 dockerfile is not used for M1 profiling. This may or may not be sensible -- but at least confuses me (why do we still have this file)? If we decide to keep this file, the corresponding image should be built in CI and be used (and not the more generic aarch64 linux image). Also, would running the code in a comparable (VM-based) manner be possible? In which infrastructure? How do we ascertain that the functionality being tested on all platforms remains in sync with the main Dockerfile (used for x64)?

Handle missing data gracefully

As per the discussion to reduce number of profiling runs while

keep running on a longer periodic bases (e.g., every few weeks) rather than stopping entirely to avoid code rot

the data collection, deviation checking and presentation logic must handle arbitrary numbers of days without profiling runs. Currently, the profiling results of the last x(=10) calendar days are collected and displayed and the last y(=5) calendar days are used to check for performance deviations. This logic would not properly handle days/dates without any profiling runs, i.e., with longer periodic bases.

Suggestion is to retain the basic logic but permit arbitrary numbers of days without profiling runs. So, in future, the last x complete profiling runs (on all supported platforms, but on arbitrary dates in the past) are collected for presentation and the last y complete profiling runs are used to check performance deviation of any single new run.

Missing/incorrect units in benchmark results

  1. On the result page of signature algorithms memory usage, the unit of the values in the table is not specified. It should be "bytes", right?
  2. In the three charts (keygen operations, sign operations, verify operations) following the table on this page, the labels for the y-axis seems to be incorrect. The labels are "keygen/s", "sign/s", and "verify/s" respectively, same as those in SIG algorithms runtime performance. Aren't they supposed to be "bytes"?

PS: I took a look at the source code here and here and found that the labels are not consistent with those I saw on the benchmark page.

Handshake performance of "reference code" vs "performance code"

Hi,
while looking at the TLS handshake performance data at https://openquantumsafe.org/benchmarking/visualization/handshakes.html (specifically, from 2023-06-24 but other days have the same potential discrepancy) I came across a strange result. The reference code on x86 (and sometimes aarch64) is (almost) always faster (more handshakes per second) than the performance version for classical ECDHE (x25519/x448) with signature algorithm: Ed25519 or Dilithium2.
With Ed25519:

  • X25519 (x86_64-ref) | 1575.83
  • X448 (x86_64-ref) | 1033.33

vs

  • X25519 (x86_64-noport) | 1514.52
  • X448 (x86_64-noport) | 1054.72 (here the performance code is faster)

or (aarch64)

  • X25519 (aarch64-ref) | 1261.40
  • X448 (aarch64-ref) | 608.65

vs

  • X25519 (aarch64-noport) | 1252.21
  • X448 (aarch64-noport) | 602.88

With Dilitihum2:

  • X25519 (x86_64-ref) | 1320.63
  • X448 (x86_64-ref) | 836.89

vs

  • X25519 (x86_64-noport) | 1283.85
  • X448 (x86_64-noport) | 814.40

When looking at post-quantum key exchanges it makes sense (performance code is faster than reference) but with classical key exchange it seems to be the opposite on x86.

However, when I select RSA2048 or ECDSAprime256v1 those results for ECDHE make sense (performance code is faster than reference) but then Kyber key exchange has the opposite effect (reference code Kyber + (RSA or ECDSA) is faster than performance code).

Is this a measurement error due to high variance? Is the "performance code" for classical key exchanges even different from the "reference code" since I would guess you use the OpenSSL implementation? Is there any other explanation for these strange results?

By the way, thank you for making and maintaining this project!

Agree goals of tests and visualization

Following up on discussions in open-quantum-safe/liboqs#928 :

it would be helpful for the OQS team to clarify what the purpose of the tests are:

  • Comparing algorithms vs Comparing algorithm evolution
  • Testing algorithm runtime variation for a given message vs testing algorithms runtime over a distribution of messages

My personal initial attempt to answer:

Algorithm evolution may not be as interesting as a comparison across algorithms (and their variants)
Runtime variations for any algorithm (or algorithm variant) with any kind of dependency would be interesting to highlight (possibly as a new set of tests and visualizations).

Project renaming checklist

Checklist for renaming this project (to "profiling"?)

  • update local webpage generator code (perf/scripts/gen_website.sh)
  • update local README.md's (CCI tag, issue documentation reference)
  • after name change, users with checked-out code should run git remote set-url origin [email protected]:openquantumsafe/profiling.git (or simply clone anew).
  • after name change, update project reference at https://openquantumsafe.org/benchmarking/

Docker filename is independent and can remain as-is ("oqs-perf"); AWS-deploy naming ("oqs-test") also is independent of this name; CCI build-process of this project also does not already seem to be triggered externally (neither in openssl nor liboqs).

@dstebila @xvzcf Please review/amend as you see fit. My check comprised also of going through all references to the word "speed" in all OQS subprojects I have checked out. In that process I found some "speed" related references that make me wonder whether we might want to integrate them here, too (e.g., boringssl speed)?

Investigate performance drop in handshaking

With the switch to oqsprovider+OpenSSL3, handshaking performance in general dropped by 20%-30%. Need to investigate how classic algorithm performance (independent of oqsprovider) and PQ alg performance changes. For this, need to fix #94 .

Create downloadable visualization tgz

At the end of each performance test collection run, create an externally accessible/downloadable .tgz file with all HTML+JavaScript+JSON files. Easiest would be an S3-based Web-folder readable to the world: Would that be OK for you @dstebila ? You could then wget this to a location of your choice. Alternatively, we could push (scp?) to a server where you want it.

Add P-521 kex and sig

It would be useful to add the P-521 curve as a standalone choice for KEX and auth in TLS (currently, only P-256 and P-384 options are available). This is for performance testing in the NCCoE project, to compare with the L5 CNSA 2.0 suite (Kyber1024/Dilithium5).

Timeseries visualisation

Moving a discussion from a PR #20 to an issue:

@baentsch : Question to @dstebila We're collecting all data since Oct 4 into the website display (and the file site.tgz): Shall we keep doing this or become more selective? "Oct 4" is actually a parameter in gen_website.sh...

@dstebila : Unclear to me. Let's reflect on the purpose of presenting it as a time series. I understand it serves two purposes: seeing how performance changed after important commits were merged, and providing a visual average over time (effectively adding more data points).

Agreed. So what about adding a mechanism to selectively set dates that shall become part of the visualization export (encoded in gen_website.sh? ) and which could be manually amended/checked in to github as and when substantial changes to the code base occur and as and when the current number of test runs shown becomes too large to derive meaningful insight from), e.g., a file containing a list of dates (instead of the single start date currently passed as parameter to the HTML-generator python script)? This would be a whitelist approach at the generating end (dropping data from export).
An alternative could be a simple "pruning" script deleting unwanted/redundant data/date points from the current daily exports, i.e., a blacklist approach that could be done at the receiving end (still retaining all data points without visualizing all of them). My preference would be on this, second option.
@dstebila: Any preference? Alternative suggestions?

multiarch docker image

Currently, separate architectures are build/pushed to dockerhub differently: x86_64 is done automatically via CI, arm64 is built&pushed manually from an ARM64 VM. A true multiarch docker image would be ideal.
First tests to do both from within CCI failed:

  1. An ARM64 machine is not supported out of the box (request to CCI for enabling this beta facility is pending)
  2. A qemu-based docker buildx branch takes way to long (and then times out).

Another approach now would be to cross-build everything and only then insert the resultant binaries to the respective architecture base docker images. This apparently only works for liboqs in Debian (not alpine as currently used for profiling) and requires additional investigation as to how to cross-build S3 access (for storing test run results) and OpenSSL.

Suggestions/alternative ideas solicited how to achieve this goal (@jschanck ?)

AWS performance variances

There are some surprising performance differences when running the performance container (openquantumsafe/oqs-perf) in AWS EC2 manually and under cron control.

AWS manual run (docker run -it openquantumsafe/oqs-perf /opt/oqssa/bin/speed_sig)

[ec2-user@ip-10-0-0-13 ~]$ docker run -it openquantumsafe/oqs-perf /opt/oqssa/bin/speed_sig
Configuration info
==================
Target platform:  x86_64-Linux-5.4.0-47-generic
Compiler:         gcc (9.3.0)
Compile options:  [-Werror;-Wall;-Wextra;-Wpedantic;-Wstrict-prototypes;-Wshadow;-Wformat=2;-Wfloat-equal;-Wwrite-strings;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.4.0
Git commit:       1d08c9d6ab696c9d50e36231447d56ddc05735d6
OpenSSL enabled:  Yes (OpenSSL 1.1.1g  21 Apr 2020)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            OpenSSL
CPU exts active:  AES-AVX-AVX2-BMI-BMI2-POPCNT-SSE-SSE2-SSE3

Speed test
==========
Started at 2020-09-26 07:05:36
Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |      29931 |          3.000 |         100.233 |     26.477 |                    288371 |      76397
sign                           |       5779 |          3.000 |         519.196 |    384.120 |                   1503404 |    1113889
verify                         |      30073 |          3.000 |          99.759 |      2.062 |                    287081 |       5693
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |      31144 |          3.000 |          96.329 |      1.896 |                    277202 |       5273
sign                           |       5769 |          3.000 |         520.023 |    367.729 |                   1505764 |    1066415
verify                         |      30064 |          3.000 |          99.789 |      1.827 |                    287174 |       4989
DILITHIUM_3                    |            |                |                 |            |                           |           
keypair                        |      21025 |          3.000 |         142.690 |      2.821 |                    411638 |       7948
sign                           |       3912 |          3.000 |         766.960 |    575.565 |                   2221895 |    1669008
verify                         |      19896 |          3.000 |         150.786 |     34.302 |                    435014 |      99061
DILITHIUM_4                    |            |                |                 |            |                           |           
keypair                        |      15889 |          3.000 |         188.813 |      4.764 |                    545337 |      13586
sign                           |       4235 |          3.002 |         708.912 |    438.295 |                   2053503 |    1270955
verify                         |      15501 |          3.000 |         193.542 |      3.386 |                    559032 |       9592
Falcon-512                     |            |                |                 |            |                           |           
keypair                        |        151 |          3.018 |       19985.013 |   7141.066 |                  57953387 |   20709488
sign                           |        532 |          3.004 |        5647.523 |     33.441 |                  16375146 |      96541
verify                         |      49284 |          3.000 |          60.873 |      1.745 |                    174499 |       4840
Falcon-1024                    |            |                |                 |            |                           |           
keypair                        |         56 |          3.005 |       53657.536 |  21900.956 |                 155603090 |   63511470
sign                           |        244 |          3.005 |       12315.061 |     24.312 |                  35710758 |      69744
verify                         |      24929 |          3.000 |         120.342 |      2.393 |                    346854 |       6744

AWS CRON (from https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=oqs-speed;stream=oqs-speed%2Foqsperf%2F5971bab8-ab8e-457a-ac56-f5823580795a;start=2020-09-25T04:24:52Z)

Configuration info
==================
Target platform:  x86_64-Linux-5.4.0-47-generic
Compiler:         gcc (9.3.0)
Compile options:  [-Werror;-Wall;-Wextra;-Wpedantic;-Wstrict-prototypes;-Wshadow;-Wformat=2;-Wfloat-equal;-Wwrite-strings;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.4.0
Git commit:       1d08c9d6ab696c9d50e36231447d56ddc05735d6
OpenSSL enabled:  Yes (OpenSSL 1.1.1g  21 Apr 2020)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            OpenSSL
CPU exts active:  AES-AVX-AVX2-BMI-BMI2-POPCNT-SSE-SSE2-SSE3
Speed test
==========
Started at 2020-09-26 02:51:11
Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |       3875 |          3.000 |         774.205 |   7673.649 |                   2243411 |   22253375
sign                           |        716 |          3.000 |        4190.426 |  17596.804 |                  12150367 |   51030584
verify                         |       3578 |          3.000 |         838.465 |   7990.712 |                   2429622 |   23172908
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |       3893 |          3.000 |         770.637 |   7654.745 |                   2233039 |   22198558
sign                           |        712 |          3.001 |        4214.861 |  17634.441 |                  12221155 |   51139538
verify                         |       3752 |          3.000 |         799.576 |   7794.350 |                   2248827 |   22221764
DILITHIUM_3                    |            |                |                 |            |                           |           
keypair                        |       2683 |          3.009 |        1121.493 |   9204.791 |                   3155331 |   26244746
sign                           |        468 |          3.001 |        6412.256 |  21597.235 |                  18593514 |   62631559
verify                         |       2611 |          3.000 |        1149.017 |   9331.701 |                   3330303 |   27061729
DILITHIUM_4                    |            |                |                 |            |                           |           
keypair                        |       1961 |          3.000 |        1529.850 |  10743.928 |                   4434616 |   31157112
sign                           |        529 |          3.001 |        5672.085 |  20304.290 |                  16446986 |   58882089
verify                         |       1930 |          3.000 |        1554.496 |  10828.900 |                   4506234 |   31403574
Falcon-512                     |            |                |                 |            |                           |           
keypair                        |         18 |          3.202 |      177867.500 |  60619.326 |                 515813317 |  175792762
sign                           |         67 |          3.093 |       46157.448 |  43652.909 |                 133853713 |  126593069
verify                         |       6174 |          3.000 |         485.916 |   6085.696 |                   1407429 |   17648388
Falcon-1024                    |            |                |                 |            |                           |           
keypair                        |          8 |          3.406 |      425755.500 | 110722.507 |                1234691953 |  321099133
sign                           |         31 |          3.095 |       99853.806 |  22401.428 |                 289572238 |   64964560
verify                         |       3095 |          3.000 |         969.308 |   8574.783 |                   2809235 |   24866673

The execution was performed in the same AWS cluster (https://us-east-2.console.aws.amazon.com/ecs/home?region=us-east-2#/clusters/oqs-speed/scheduledTasks) with c4.large instances.

Add non-PQ as a baseline

It would be useful to have performance numbers for non-PQ algorithms as a baseline comparison. This is present in some of the profiling operations but not all (e.g., handshake performance). This would include both non-PQ key exchange and non-PQ signatures.

Add Benchmark Support for Classic McEliece Algorithms in openssl speed

While running the performance tests, I noticed that the Classic McEliece algorithms are currently not included in the openSSL benchmarks. Since Classic McEliece is considered a promising candidate for post-quantum cryptographic schemes, it would be valuable to include these algorithms in the OpenSSL benchmarking tools.

Add hybrid KEX to TLS benchmarking tests

It'd be good to add hybrid KEX as well to the TLS benchmarking tests. If resources are constrained, then only enabling the Kyber variants (and perhaps the fallback NTRU) would suffice. These are likely be the first deployed algs in practice, so this data would be insightful. Perf tests with these algs will also be run in the NCCoE projects, so having a basis for comparison would help.

Add M1

Run profiling on Apple M1 machine.

Issues known:

  • Valgrind not supported on recent (M1) OSX:
checking for the kernel version... unsupported (21.3.0)
configure: error: Valgrind works on Darwin 10.x-20.x (Mac OS X 10.6-10.11 and macOS 10.12-11.0)

--> Memory tests cannot be run

--> Arguably pointless to run more than baseline (OQS_DIST_BUILD=OFF) tests

Visualization

This is to collect work-in-progress information on speed-JSON file visualization.

Current version accessible at https://test.openquantumsafe.org/performance.html

Feedback/improvement suggestions by @dstebila :

Decide on further profiling runs

Now that additional AWS credits are available, decide which additional things to profile.

Options: Different CPU types (AMD, ARM (which), ...?) New OSs (OSX, Windows, RedHat, ... -- some possibly on ARM64) further network simulations, ...???

Input solicited: Additional options? Priorities?

Edit: Instance types: https://aws.amazon.com/ec2/instance-types/

Decide on portability

Currently profiling is done with OQS_PORTABLE_BUILD set and we show all measurements of reference and optimized code with this setting.

Question is whether we want to

  • disable this configuration option to showcase the best possible performance, or
  • keep it as-is, or
  • to even create an even less portable build option for liboqs, activating -mnative.

Opinions, thoughts, further alternatives welcome as reply to this issue. Proposals as to how to visualize alternatives are also solicited.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.