open-quantum-safe / profiling Goto Github PK

View Code? Open in Web Editor NEW

7.0 8.0 6.0 178 KB

Network-level performance testing of post-quantum cryptography using the OQS suite

Home Page: https://openquantumsafe.org/benchmarking

License: MIT License

Dockerfile 4.08% Python 15.30% Shell 4.54% HTML 22.81% JavaScript 51.20% CSS 1.06% C 1.02%

cryptography post-quantum tls performance

profiling's Introduction

profiling

Testing various functional and non-functional properties like performance and memory consumption

Purpose

This repository is to contain software geared to collect profiling information across the algorithms supported by liboqs at different levels of the software and network stack.

Particularly, measurements will be collected using

´liboqs´ library-level performance testing using ´speed_sig´ and ´speed_kem´ for execution performance numbers and ´test_sig_mem´ and ´test_kem_mem´ for memory consumption numbers (heap and stack)
´openssl´ application-level performance testing using ´openssl speed´
´openssl´ "basic network"-level raw handshake performance testing using ´openssl s_time´
"Simulated"/controlled network-level performance testing [not yet implemented]
"Full stack" performance testing using standard client software like ´curl´ and standard server software like ´nginx´ [not yet implemented].

This repository will not contain tests replicating raw algorithm-level testing as done by Supercop.

Methodology

All tests

are packaged into standalone Docker images facilitating execution across different (cloud) platforms and hardware architectures & CPU optimizations
are designed to return JSON output representing current profiling numbers that can be stored arbitrarily; initial storage facilities are provided to deposit data into AWS S3.
allow to also collect/document profiling numbers of classic crypto to permit comparison with PQC algorithms
can be visualized by suitable Chart.js code: see visualization folder.

Wrapper scripts are created to facilitate automatically running these tests on different cloud infrastructures and storing the resulting JSON output as well as the wrapping HTML and JavaScript code.

profiling's People

Contributors

Stargazers

Watchers

Forkers

tomcruse baentsch ax1 dtzimmerman yuanmabiji martyrshot

profiling's Issues

Add CPU frequency to aarch64 test run results

Unlike on x64 VMs, CPU frequency is not output by cat /proc/cpuinfo on AWS aarch64 VMs. This issue is to find a way how this information can be obtained and added to the profiling run results.

Error with command "-curves"

What I tried:

docker run -v ~/resHandshake/:/opt/test/results oqs-perf python3 /opt/test/handshakes.py

What I expected:

That the command runs without errors

What I got:

Generating a oqs_sig_default private key
writing new private key to 'CA.key'

Generating a oqs_sig_default private key
writing new private key to '/opt/test/server.key'

Signature ok
subject=CN = localhost
Getting CA Private Key
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Using default temp DH parameters
ACCEPT
Error with command: "-curves "
Generating a p256_oqs_sig_default private key
writing new private key to 'CA.key'

Why is it throwing this error? What is "-curves" needed for?

Reference code running as fast as optimized code

open-quantum-safe/liboqs#1361 changed the default build library options to run optimized code. Thus, the designation "-ref" for "reference implementation" as being the liboqs default build option is no longer valid. This means, "-ref" must now be explicitly built with "-DOQS_DIST_BUILD=OFF".

Build ARM64 image natively via CCI

As per question 4 in open-quantum-safe/liboqs#1304

Investigate changes

As per April 3, the new "portability" build flags became active in the profiling docker image and per April 8 an update to this image did (incorporating open-quantum-safe/liboqs#957). Both events are visible in the snapshots below, but performance in general never quite went back to previous levels (all x86_64): Would it be reasonable to investigated/improve this? Are build flags set incorrectly for creating the profiling Docker image? Is something else going wrong in building things? Is there a more general issue? Were things incorrect before?

SIKE&SIDH (performance) lost drastically with the last update:

Kyber (plain, non AES-version) distributable lost somewhat with the first change, but never recovered:

Frodo distributable, also lost:

HQC (performance and distributable) in turn benefited from both changes:

Dilithium AES benefitted while plain Dilithium dropped (performance and distributable)

Tagging @jschanck for thoughts

Do optimized and "plain C" measurements

As per suggestion by Alex.

First thoughts: Probably different docker images; visualization: Same graph (some work) or separate ones (easy)

Rework visualisation logic

Remove code that's basically copied -> more parameterization
Add support for selecting/differentiation between (many) different architectures
Merge JSON files for different test types/configurations and running on different architectures into single JSON files

Investigate disappointing ARM64 results

https://openquantumsafe.org/benchmarking/visualization/speed_kem.html doesn't show marked performance improvements for aarch64 Kyber as should be visible due to open-quantum-safe/liboqs#1117 now being part of profiling image -> config options not properly set?

@Martyrshot: What performance changes would you expect based on your local tests?

profiling run failing in debian

To reproduce, run openssl speed rsa3072_sphincsshake256128frobust

Investigate x64 AWS failures

Change display logic

While hybrid KEM handshakes are recorded as per #86 (see e.g., https://openquantumsafe.org/benchmarking/visualization/2023-01-27/handshakes.json) they are not (yet) displayed as the oldest test run defines the algorithms shown. Thus it takes time for new algorithms to become visible. This PR is to suggest changing this logic (at least for single-day run result display).

liboqs benchmarking still running 0.9.0-rc1

See https://openquantumsafe.org/benchmarking/visualization/2024-04-02/speed_kem.json.

I think I fixed this by reordering the Docker cleanup commands in the scripts run by cron jobs on various platforms. We'll see when the tests run next, I guess.

TODO in future profiling updates/fixes: add the trigger scripts to GitHub somewhere so we can have proper version control for them.

Add cpuinfo output to ARM runs

cpuinfo is empty for aarch64 runs. Fix this to allow investigating execution deviations between different runs.

Changing algorithm names leading to data view reduction

Renaming the rainbow family leads to (new algorithms') data points to not be reliably visualized. Data points do exist, though, as per raw data file(s).

Investigate why x86_64 profiling values are not properly displayed

Since #92 landed, classic algorithm runs results are not displayed:

even though the values seem to be available:

Automated deviation notification

Check results of most recent runs with previous runs & generate notifications (email?) in case of significant (10%?) deviations.
Shall avoid things like this to go unnoticed (BIKE keygen performance):

Automate M1 profiling image build & use

Due to the unavailability (at time of creating the tests) of M1 VMs in AWS and unlike the timed AWS-based runs for x64 and aarch64, the profiling run on m1 is currently triggered by a cron job on a dedicated laptop like this:

docker run --privileged openquantumsafe/oqs-perf bash -c "cd /opt/test && ./run-tests-m1.sh" > /Users/baentsch/profiling/docker-run-m1.log 2>&1

In particular, this means the dedicated Dockerfile-m1 dockerfile is not used for M1 profiling. This may or may not be sensible -- but at least confuses me (why do we still have this file)? If we decide to keep this file, the corresponding image should be built in CI and be used (and not the more generic aarch64 linux image). Also, would running the code in a comparable (VM-based) manner be possible? In which infrastructure? How do we ascertain that the functionality being tested on all platforms remains in sync with the main Dockerfile (used for x64)?

Investigate memory use changes in handshaking

With the switch to OpenSSL3+oqsprovider, heap memory use during handshaking jumped by 200-300%. Investigate and fix. Most likely culprit: open-quantum-safe/oqs-provider#155

Cloud deployment automation

Automation scripts for image deployment for:

AWS
Azure
IBM

Handle missing data gracefully

As per the discussion to reduce number of profiling runs while

keep running on a longer periodic bases (e.g., every few weeks) rather than stopping entirely to avoid code rot

the data collection, deviation checking and presentation logic must handle arbitrary numbers of days without profiling runs. Currently, the profiling results of the last x(=10) calendar days are collected and displayed and the last y(=5) calendar days are used to check for performance deviations. This logic would not properly handle days/dates without any profiling runs, i.e., with longer periodic bases.

Suggestion is to retain the basic logic but permit arbitrary numbers of days without profiling runs. So, in future, the last x complete profiling runs (on all supported platforms, but on arbitrary dates in the past) are collected for presentation and the last y complete profiling runs are used to check performance deviation of any single new run.

Missing/incorrect units in benchmark results

On the result page of signature algorithms memory usage, the unit of the values in the table is not specified. It should be "bytes", right?
In the three charts (keygen operations, sign operations, verify operations) following the table on this page, the labels for the y-axis seems to be incorrect. The labels are "keygen/s", "sign/s", and "verify/s" respectively, same as those in SIG algorithms runtime performance. Aren't they supposed to be "bytes"?

PS: I took a look at the source code here and here and found that the labels are not consistent with those I saw on the benchmark page.

Add boringssl speed profiling

As per this discussion.

Handshake performance of "reference code" vs "performance code"

Hi,
while looking at the TLS handshake performance data at https://openquantumsafe.org/benchmarking/visualization/handshakes.html (specifically, from 2023-06-24 but other days have the same potential discrepancy) I came across a strange result. The reference code on x86 (and sometimes aarch64) is (almost) always faster (more handshakes per second) than the performance version for classical ECDHE (x25519/x448) with signature algorithm: Ed25519 or Dilithium2.
With Ed25519:

X25519 (x86_64-ref) | 1575.83
X448 (x86_64-ref) | 1033.33

X25519 (x86_64-noport) | 1514.52
X448 (x86_64-noport) | 1054.72 (here the performance code is faster)

or (aarch64)

X25519 (aarch64-ref) | 1261.40
X448 (aarch64-ref) | 608.65

X25519 (aarch64-noport) | 1252.21
X448 (aarch64-noport) | 602.88

With Dilitihum2:

X25519 (x86_64-ref) | 1320.63
X448 (x86_64-ref) | 836.89

X25519 (x86_64-noport) | 1283.85
X448 (x86_64-noport) | 814.40

When looking at post-quantum key exchanges it makes sense (performance code is faster than reference) but with classical key exchange it seems to be the opposite on x86.

However, when I select RSA2048 or ECDSAprime256v1 those results for ECDHE make sense (performance code is faster than reference) but then Kyber key exchange has the opposite effect (reference code Kyber + (RSA or ECDSA) is faster than performance code).

Is this a measurement error due to high variance? Is the "performance code" for classical key exchanges even different from the "reference code" since I would guess you use the OpenSSL implementation? Is there any other explanation for these strange results?

By the way, thank you for making and maintaining this project!

aarch64 and m1 testing not generating OpenSSL results

As exemplified by this results log excerpt:

Probable reason: Use of incorrect openssl version in those architectures (PATH issue?):

Agree goals of tests and visualization

Following up on discussions in open-quantum-safe/liboqs#928 :

it would be helpful for the OQS team to clarify what the purpose of the tests are:

Comparing algorithms vs Comparing algorithm evolution

Testing algorithm runtime variation for a given message vs testing algorithms runtime over a distribution of messages

My personal initial attempt to answer:

Algorithm evolution may not be as interesting as a comparison across algorithms (and their variants)
Runtime variations for any algorithm (or algorithm variant) with any kind of dependency would be interesting to highlight (possibly as a new set of tests and visualizations).

Investigate gcc-11/OpenSSL3 profiling results on M1

Accidentally, profiling on M1 ran on ubuntu-22 with gcc-11 on April 26+27 and May 1+2 before reverting back to debian-bullseye using gcc-10. Below some of the surprising performance deviations:

Show correct test run metadata

Changing the date doesn't change the (profiling run) metadata displayed.

Project renaming checklist

Checklist for renaming this project (to "profiling"?)

update local webpage generator code (perf/scripts/gen_website.sh)
update local README.md's (CCI tag, issue documentation reference)
after name change, users with checked-out code should run git remote set-url origin [email protected]:openquantumsafe/profiling.git (or simply clone anew).
after name change, update project reference at https://openquantumsafe.org/benchmarking/

Docker filename is independent and can remain as-is ("oqs-perf"); AWS-deploy naming ("oqs-test") also is independent of this name; CCI build-process of this project also does not already seem to be triggered externally (neither in openssl nor liboqs).

@dstebila @xvzcf Please review/amend as you see fit. My check comprised also of going through all references to the word "speed" in all OQS subprojects I have checked out. In that process I found some "speed" related references that make me wonder whether we might want to integrate them here, too (e.g., boringssl speed)?

Optimize data download

Possibly combine with direct load off S3

OpenSSL reference tests are linked against the wrong library

The oqs-ref shared library is copied to liboqs.so.LIBOQS_VERSION instead of liboqs.so.$LIBOQS_VERSION:

profiling/perf/scripts/run-tests.sh

Line 33 in 9ff4911

 cp /opt/oqssa/oqs-ref/lib/liboqs.so.$LIBOQS_VERSION /opt/oqssa/lib/liboqs.so.LIBOQS_VERSION 

Data collection inconsistent

Data as requested by @christianpaquin in #93 is being collected (see e.g. https://openquantumsafe.org/benchmarking/visualization/2023-07-08/handshakes.json) but in an inconsistent manner: Many algorithm combinations deliver results on x64 but not aarch64 and m1 and in many cases it's the opposite. Investigation warranted -- as on the many other issues in this project. Not fun being alone looking at these...

Investigate performance drop in handshaking

With the switch to oqsprovider+OpenSSL3, handshaking performance in general dropped by 20%-30%. Need to investigate how classic algorithm performance (independent of oqsprovider) and PQ alg performance changes. For this, need to fix #94 .

Create downloadable visualization tgz

At the end of each performance test collection run, create an externally accessible/downloadable .tgz file with all HTML+JavaScript+JSON files. Easiest would be an S3-based Web-folder readable to the world: Would that be OK for you @dstebila ? You could then wget this to a location of your choice. Alternatively, we could push (scp?) to a server where you want it.

Add P-521 kex and sig

It would be useful to add the P-521 curve as a standalone choice for KEX and auth in TLS (currently, only P-256 and P-384 options are available). This is for performance testing in the NCCoE project, to compare with the L5 CNSA 2.0 suite (Kyber1024/Dilithium5).

Timeseries visualisation

Moving a discussion from a PR #20 to an issue:

@baentsch : Question to @dstebila We're collecting all data since Oct 4 into the website display (and the file site.tgz): Shall we keep doing this or become more selective? "Oct 4" is actually a parameter in gen_website.sh...

@dstebila : Unclear to me. Let's reflect on the purpose of presenting it as a time series. I understand it serves two purposes: seeing how performance changed after important commits were merged, and providing a visual average over time (effectively adding more data points).

Agreed. So what about adding a mechanism to selectively set dates that shall become part of the visualization export (encoded in gen_website.sh? ) and which could be manually amended/checked in to github as and when substantial changes to the code base occur and as and when the current number of test runs shown becomes too large to derive meaningful insight from), e.g., a file containing a list of dates (instead of the single start date currently passed as parameter to the HTML-generator python script)? This would be a whitelist approach at the generating end (dropping data from export).
An alternative could be a simple "pruning" script deleting unwanted/redundant data/date points from the current daily exports, i.e., a blacklist approach that could be done at the receiving end (still retaining all data points without visualizing all of them). My preference would be on this, second option.
@dstebila: Any preference? Alternative suggestions?

multiarch docker image

Currently, separate architectures are build/pushed to dockerhub differently: x86_64 is done automatically via CI, arm64 is built&pushed manually from an ARM64 VM. A true multiarch docker image would be ideal.
First tests to do both from within CCI failed:

An ARM64 machine is not supported out of the box (request to CCI for enabling this beta facility is pending)
A qemu-based docker buildx branch takes way to long (and then times out).

Another approach now would be to cross-build everything and only then insert the resultant binaries to the respective architecture base docker images. This apparently only works for liboqs in Debian (not alpine as currently used for profiling) and requires additional investigation as to how to cross-build S3 access (for storing test run results) and OpenSSL.

Suggestions/alternative ideas solicited how to achieve this goal (@jschanck ?)

AWS performance variances

There are some surprising performance differences when running the performance container (openquantumsafe/oqs-perf) in AWS EC2 manually and under cron control.

AWS manual run (docker run -it openquantumsafe/oqs-perf /opt/oqssa/bin/speed_sig)

[ec2-user@ip-10-0-0-13 ~]$ docker run -it openquantumsafe/oqs-perf /opt/oqssa/bin/speed_sig
Configuration info
==================
Target platform:  x86_64-Linux-5.4.0-47-generic
Compiler:         gcc (9.3.0)
Compile options:  [-Werror;-Wall;-Wextra;-Wpedantic;-Wstrict-prototypes;-Wshadow;-Wformat=2;-Wfloat-equal;-Wwrite-strings;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.4.0
Git commit:       1d08c9d6ab696c9d50e36231447d56ddc05735d6
OpenSSL enabled:  Yes (OpenSSL 1.1.1g  21 Apr 2020)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            OpenSSL
CPU exts active:  AES-AVX-AVX2-BMI-BMI2-POPCNT-SSE-SSE2-SSE3

Speed test
==========
Started at 2020-09-26 07:05:36
Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |      29931 |          3.000 |         100.233 |     26.477 |                    288371 |      76397
sign                           |       5779 |          3.000 |         519.196 |    384.120 |                   1503404 |    1113889
verify                         |      30073 |          3.000 |          99.759 |      2.062 |                    287081 |       5693
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |      31144 |          3.000 |          96.329 |      1.896 |                    277202 |       5273
sign                           |       5769 |          3.000 |         520.023 |    367.729 |                   1505764 |    1066415
verify                         |      30064 |          3.000 |          99.789 |      1.827 |                    287174 |       4989
DILITHIUM_3                    |            |                |                 |            |                           |           
keypair                        |      21025 |          3.000 |         142.690 |      2.821 |                    411638 |       7948
sign                           |       3912 |          3.000 |         766.960 |    575.565 |                   2221895 |    1669008
verify                         |      19896 |          3.000 |         150.786 |     34.302 |                    435014 |      99061
DILITHIUM_4                    |            |                |                 |            |                           |           
keypair                        |      15889 |          3.000 |         188.813 |      4.764 |                    545337 |      13586
sign                           |       4235 |          3.002 |         708.912 |    438.295 |                   2053503 |    1270955
verify                         |      15501 |          3.000 |         193.542 |      3.386 |                    559032 |       9592
Falcon-512                     |            |                |                 |            |                           |           
keypair                        |        151 |          3.018 |       19985.013 |   7141.066 |                  57953387 |   20709488
sign                           |        532 |          3.004 |        5647.523 |     33.441 |                  16375146 |      96541
verify                         |      49284 |          3.000 |          60.873 |      1.745 |                    174499 |       4840
Falcon-1024                    |            |                |                 |            |                           |           
keypair                        |         56 |          3.005 |       53657.536 |  21900.956 |                 155603090 |   63511470
sign                           |        244 |          3.005 |       12315.061 |     24.312 |                  35710758 |      69744
verify                         |      24929 |          3.000 |         120.342 |      2.393 |                    346854 |       6744

AWS CRON (from https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=oqs-speed;stream=oqs-speed%2Foqsperf%2F5971bab8-ab8e-457a-ac56-f5823580795a;start=2020-09-25T04:24:52Z)

Configuration info
==================
Target platform:  x86_64-Linux-5.4.0-47-generic
Compiler:         gcc (9.3.0)
Compile options:  [-Werror;-Wall;-Wextra;-Wpedantic;-Wstrict-prototypes;-Wshadow;-Wformat=2;-Wfloat-equal;-Wwrite-strings;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.4.0
Git commit:       1d08c9d6ab696c9d50e36231447d56ddc05735d6
OpenSSL enabled:  Yes (OpenSSL 1.1.1g  21 Apr 2020)
AES:              OpenSSL
SHA-2:            OpenSSL
SHA-3:            OpenSSL
CPU exts active:  AES-AVX-AVX2-BMI-BMI2-POPCNT-SSE-SSE2-SSE3
Speed test
==========
Started at 2020-09-26 02:51:11
Operation                      | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |       3875 |          3.000 |         774.205 |   7673.649 |                   2243411 |   22253375
sign                           |        716 |          3.000 |        4190.426 |  17596.804 |                  12150367 |   51030584
verify                         |       3578 |          3.000 |         838.465 |   7990.712 |                   2429622 |   23172908
DILITHIUM_2                    |            |                |                 |            |                           |           
keypair                        |       3893 |          3.000 |         770.637 |   7654.745 |                   2233039 |   22198558
sign                           |        712 |          3.001 |        4214.861 |  17634.441 |                  12221155 |   51139538
verify                         |       3752 |          3.000 |         799.576 |   7794.350 |                   2248827 |   22221764
DILITHIUM_3                    |            |                |                 |            |                           |           
keypair                        |       2683 |          3.009 |        1121.493 |   9204.791 |                   3155331 |   26244746
sign                           |        468 |          3.001 |        6412.256 |  21597.235 |                  18593514 |   62631559
verify                         |       2611 |          3.000 |        1149.017 |   9331.701 |                   3330303 |   27061729
DILITHIUM_4                    |            |                |                 |            |                           |           
keypair                        |       1961 |          3.000 |        1529.850 |  10743.928 |                   4434616 |   31157112
sign                           |        529 |          3.001 |        5672.085 |  20304.290 |                  16446986 |   58882089
verify                         |       1930 |          3.000 |        1554.496 |  10828.900 |                   4506234 |   31403574
Falcon-512                     |            |                |                 |            |                           |           
keypair                        |         18 |          3.202 |      177867.500 |  60619.326 |                 515813317 |  175792762
sign                           |         67 |          3.093 |       46157.448 |  43652.909 |                 133853713 |  126593069
verify                         |       6174 |          3.000 |         485.916 |   6085.696 |                   1407429 |   17648388
Falcon-1024                    |            |                |                 |            |                           |           
keypair                        |          8 |          3.406 |      425755.500 | 110722.507 |                1234691953 |  321099133
sign                           |         31 |          3.095 |       99853.806 |  22401.428 |                 289572238 |   64964560
verify                         |       3095 |          3.000 |         969.308 |   8574.783 |                   2809235 |   24866673

The execution was performed in the same AWS cluster (https://us-east-2.console.aws.amazon.com/ecs/home?region=us-east-2#/clusters/oqs-speed/scheduledTasks) with c4.large instances.

Move profiling to OpenSSL3

Before doing this,

get agreement on open-quantum-safe/openssl#434
do manual test runs using OSSL3 instead of OSSL111 as baseline code as per OQS_USE_OPENSSL and no performance deviation is visible (then also closing #74)
ensure openssl/openssl#19968 has landed

Add non-PQ as a baseline

It would be useful to have performance numbers for non-PQ algorithms as a baseline comparison. This is present in some of the profiling operations but not all (e.g., handshake performance). This would include both non-PQ key exchange and non-PQ signatures.

Add Benchmark Support for Classic McEliece Algorithms in openssl speed

While running the performance tests, I noticed that the Classic McEliece algorithms are currently not included in the openSSL benchmarks. Since Classic McEliece is considered a promising candidate for post-quantum cryptographic schemes, it would be valuable to include these algorithms in the OpenSSL benchmarking tools.

Add hybrid KEX to TLS benchmarking tests

It'd be good to add hybrid KEX as well to the TLS benchmarking tests. If resources are constrained, then only enabling the Kyber variants (and perhaps the fallback NTRU) would suffice. These are likely be the first deployed algs in practice, so this data would be insightful. Perf tests with these algs will also be run in the NCCoE projects, so having a basis for comparison would help.

Add M1

Run profiling on Apple M1 machine.

Issues known:

Valgrind not supported on recent (M1) OSX:

checking for the kernel version... unsupported (21.3.0)
configure: error: Valgrind works on Darwin 10.x-20.x (Mac OS X 10.6-10.11 and macOS 10.12-11.0)

--> Memory tests cannot be run

Reference code cannot be triggered

--> Arguably pointless to run more than baseline (OQS_DIST_BUILD=OFF) tests

No M1 results in JSON files since February

M1 testing is running as per regular email results, but M1 numbers are empty, see e.g., https://openquantumsafe.org/benchmarking/visualization/2023-03-17/speed_kem.json.

Visualization

This is to collect work-in-progress information on speed-JSON file visualization.

Current version accessible at https://test.openquantumsafe.org/performance.html

Feedback/improvement suggestions by @dstebila :

Decide on further profiling runs

Now that additional AWS credits are available, decide which additional things to profile.

Options: Different CPU types (AMD, ARM (which), ...?) New OSs (OSX, Windows, RedHat, ... -- some possibly on ARM64) further network simulations, ...???

Input solicited: Additional options? Priorities?

Edit: Instance types: https://aws.amazon.com/ec2/instance-types/

Decide on portability

Currently profiling is done with OQS_PORTABLE_BUILD set and we show all measurements of reference and optimized code with this setting.

Question is whether we want to

disable this configuration option to showcase the best possible performance, or
keep it as-is, or
to even create an even less portable build option for liboqs, activating -mnative.

Opinions, thoughts, further alternatives welcome as reply to this issue. Proposals as to how to visualize alternatives are also solicited.