quark-framework / quark Goto Github PK

Framework for Quantum Computing Application Benchmarking

Home Page: https://quark-framework.readthedocs.io/

License: Apache License 2.0

Python 99.98% Dockerfile 0.02%

quark's Introduction

QUARK: A Framework for Quantum Computing Application Benchmarking

Quantum Computing Application Benchmark (QUARK) is a framework for orchestrating benchmarks of different industry applications on quantum computers. QUARK supports various applications, like the traveling salesperson problem (TSP), the maximum satisfiability (MaxSAT) problem, or the robot path optimization in the PVC sealing use case. It also features different solvers (e.g., simulated /quantum annealing and the quantum approximate optimization algorithm (QAOA)), quantum devices (e.g., IonQ and Rigetti), and simulators. It is designed to be easily extendable in all of its components: applications, mappings, solvers, devices, and any other custom modules.

Publications

Details about the motivations for the original framework can be found in the accompanying QUARK paper from Finžgar et al. Even though the architecture changes significantly from QUARK 1.0 to 2.0, the guiding principles still remain. The most recent publication from Kiwit et al. provides an updated overview of the functionalities and quantum machine learning features of QUARK 2.0.

Documentation

Documentation with a tutorial and developer guidelines can be found here: https://quark-framework.readthedocs.io/en/dev/.

Prerequisites

As this framework is implemented in Python 3.9, you need to install this version of Python if you do not already have it installed. Other versions could cause issues with other dependencies used in the framework. Additionally, we rely on several pip dependencies, which you can install in two ways:

Install pip packages manually, or
Use the QUARK installer.

For this installer to work, you need to install the following packages in the first place:

inquirer==3.1.2
pyyaml==6.0
packaging==23.1

To limit the number of packages you need to install, there is an option to only include a subselection of QUARK modules. You can select the modules of choice via:

python src/main.py env --configure myenv

Of course there is a default option, which will include all available options.

Depending on your configured modules, you will need to install additional Python packages, as the above-mentioned 3 packages are not sufficient to run a benchmark! We provide the option to generate a Conda file or a pip requirements file, which you can use to install the required packages. You can also configure multiple QUARK environments and then switch between them via:

python src/main.py env --activate myenv2

Note: Different modules require different python packages. Be sure that your python environment has the necessary packages installed!

To see which environments are configured, please use

python src/main.py env --list

You can also visualize the contents of your QUARK environment:

(quark) %  python src/main.py env --show myenv
[...]
Content of the environment:
>-- TSP
    >-- GreedyClassicalTSP
        >-- Local

In case you want to use custom modules files (for example, to use external modules from other repositories), you can still use the --modules option. You can find the documentation in the respective Read the Docs section.

Running a Benchmark

export HTTP_PROXY=http://username:[email protected]:8080 
export AWS_PROFILE=quantum_computing
export AWS_REGION=us-west-1
python src/main.py

HTTP_PROXY is only needed if you have to use a proxy to access AWS.

AWS_PROFILE is only needed if you want to use an AWS braket device (default is quantum_computing). In case no profile is needed in your case, please set export AWS_PROFILE=default.

AWS_REGION is only needed if you need another aws region than us-east-1. Usually this is specific to the Braket device.

Example run (You need to check at least one option with an X for the checkbox question):

(quark) % python src/main.py 
[?] What application do you want?: TSP
   PVC
   SAT
 > TSP

2023-03-21 09:18:36,440 [INFO] Import module modules.applications.optimization.TSP.TSP
[?] (Option for TSP) How many nodes does you graph need?:
 > [X] 3
   [ ] 4
   [ ] 6
   [ ] 8
   [ ] 10
   [ ] 14
   [ ] 16

[?] What submodule do you want?:
   [ ] Ising
   [ ] Qubo
 > [X] GreedyClassicalTSP
   [ ] ReverseGreedyClassicalTSP
   [ ] RandomTSP

2023-03-21 09:18:49,563 [INFO] Skipping asking for submodule, since only 1 option (Local) is available.
2023-03-21 09:18:49,566 [INFO] Submodule configuration finished
[?] How many repetitions do you want?: 1
2023-03-21 09:18:50,577 [INFO] Import module modules.applications.optimization.TSP.TSP
2023-03-21 09:18:50,948 [INFO] Created Benchmark run directory /Users/user1/QUARK/benchmark_runs/tsp-2023-03-21-09-18-50
2023-03-21 09:18:51,025 [INFO] Codebase is based on revision 075201825fa71c24b5567e1290966081be7dbdc0 and has some uncommitted changes
2023-03-21 09:18:51,026 [INFO] Running backlog item 1/1, Iteration 1/1:
2023-03-21 09:18:51,388 [INFO] Route found:
 Node 0 ->
 Node 2 ->
 Node 1
2023-03-21 09:18:51,388 [INFO] All 3 nodes got visited
2023-03-21 09:18:51,388 [INFO] Total distance (without return): 727223.0
2023-03-21 09:18:51,388 [INFO] Total distance (including return): 1436368.0
2023-03-21 09:18:51,389 [INFO]
2023-03-21 09:18:51,389 [INFO]  ============================================================
2023-03-21 09:18:51,389 [INFO]
2023-03-21 09:18:51,389 [INFO] Saving 1 benchmark records to /Users/user1/QUARK/benchmark_runs/tsp-2023-03-21-09-18-50/results.json
2023-03-21 09:18:51,746 [INFO] Finished creating plots.

All used config files, logs and results are stored in a folder in the benchmark_runs directory.

interrupt/resume

The processing of backlog items may get interrupted in which case you will see something like

2024-03-13 10:25:20,201 [INFO] ================================================================================
2024-03-13 10:25:20,201 [INFO] ====== Run 3 backlog items with 10 iterations - FINISHED:15 INTERRUPTED:15
2024-03-13 10:25:20,201 [INFO] ====== There are interrupted jobs. You may resume them by running QUARK with
2024-03-13 10:25:20,201 [INFO] ====== --resume-dir=benchmark_runs\tsp-2024-03-13-10-25-19
2024-03-13 10:25:20,201 [INFO] ================================================================================

This happens if you press CTRL-C or if some QUARK module does its work asynchronously, e.g. by submitting its job to some batch system. Learn more about how to write asynchronous modules in the developer guide. You can resume an interrupted QUARK run by calling:

python src/main.py --resume-dir=<result-dir>

Note that you can copy/paste the --resume-dir option from the QUARK output as shown in the above example.

Non-Interactive Mode

It is also possible to start the script with a config file instead of using the interactive mode:

 python src/main.py --config config.yml

Note: This should only be used by experienced users as invalid values will cause the framework to fail!

Example for a config file:

application:
  config:
    nodes:
    - 3
  name: TSP
  submodules:
  - config: {}
    name: GreedyClassicalTSP
    submodules:
    - config: {}
      name: Local
      submodules: []
repetitions: 1

Run as Container

We also support the option to run the framework as a container. After making sure your docker daemon is running, you can run the container:

docker run -it --rm ghcr.io/quark-framework/quark

You can also build the docker image locally like:

docker build -t ghcr.io/quark-framework/quark .

In case you want to use a config file you have to add it to the docker run command:

-v /Users/alice/desktop/my_config.yml:/my_config.yml

/Users/alice/desktop/my_config.yml specifies the QUARK config file on your local machine. Then you can run the docker container with the config:

docker run -it --rm  -v /Users/alice/desktop/my_config.yml:/my_config.yml ghcr.io/quark-framework/quark --config my_config.yml

In case you want to access the benchmark run folder afterwards, you can attach a volume to the run command:

-v /Users/alice/desktop/benchmark_runs:/benchmark_runs/

The results of the benchmark run are then stored to a new directory in /Users/alice/desktop/benchmark_runs.

In case you have local proxy settings you can add the following flags to the run command:

-e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HTTP_PROXY=$HTTP_PROXY -e HTTPS_PROXY=$HTTPS_PROXY

AWS credentials can be mounted to the run command like:

-v $HOME/.aws/:/root/.aws:ro

Summarizing Multiple Existing Experiments

You can also summarize multiple existing experiments like this:

python src/main.py --summarize quark/benchmark_runs/2021-09-21-15-03-53 quark/benchmark_runs/2021-09-21-15-23-01

This allows you to generate plots from multiple experiments.

License

This project is licensed under Apache License 2.0.

quark's People

Contributors

Stargazers

Watchers

Forkers

mhackersu kumagaimasahito jusschwitalla tobias97d martinmeilinger xiaohuaai matteo-stefanini chsowinski nquetschlich dlyongemallo joe-nano gmpilot-kyrios faps-schlichte qvquan chris-van-den-oetelaar drelu awram22

quark's Issues

Error during benchmark run: This operation is not supported for complex128 values because it would be ambiguous.

There seems to be a bug somewhere in the code when running this configuration (full config below):

TSP -> Ising -> qiskit -> QAOA -> Powell

At the end of the run, a TypeError is thrown:

TypeError: ufunc 'signbit' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

directly causing:

TypeError: This operation is not supported for complex128 values because it would be ambiguous.

(see Traceback at the end of logger.log)
As far as I can see, some complex value(s) end(s) up somewhere it/they shouldn't, but I have no idea how or why.
I clicked a bit through the lines mentioned in the Traceback, but couldn't find what's causing the problem.

full config:

application:
  config:
    nodes:
    - 3
  name: TSP
mapping:
  Ising:
    config:
      lagrange_factor:
      - 1.0
      mapping:
      - qiskit
    solver:
    - config:
        depth:
        - 3
        opt_method:
        - Powell
        shots:
        - 10
      device:
      - LocalSimulator
      name: QAOA
repetitions: 1

Implement Application Score in QUARK 2.0

In QUARK 1, there were "solution_validity" and "solution_quality", which were mandatory metrics to assess how well the optimization problem was solved. As these metrics cannot be applied to problem classes outside of the optimization realm, we decided to remove them in the first design of QUARK 2.

However, we believe it makes sense to introduce a similar metric called "application_score". The developers can use this optional metric to define a score for their application that can be used to compare different benchmark runs against each other on the application level.

For a given set of application score types, the BenchmarkManager then can provide some general functions that create some automatic plots, similar how it is currently done for the total_time metric in the Metrics class.

Improve "Prerequisites" Section in Readme and Tutorial

The content of this section is sufficient to understand how to install QUARK but could be clearer about how to use modular configurations and could follow a more stringent style of description.

Priority: Low

Last time this section was adjusted: #67

D-Wave annealing devices removed from Amazon Braket.

In late 2022 all D-Wave devices got removed from Amazon Braket. Attempting to run Annealing or QBSolv with these devices in QUARK will now fail. It has to be evaluated whether the now available D-Wave offerings in the AWS Marketplace are an equivalent alternative or the support for D-Wave devices has to be removed from QUARK for now.

data loss on keyboard interrupt

Hi together,

if I run a QUARK configuration with many repetitions and I interrupt the QUARK run with CTRL-C the results of the iterations already done get lost.

This is because the json.dump which comes after the repetitions loop is not done in this case. One solution could be to do this json.dump also in the "except KeyboardInterrupt" section.

best wishes,
Jürgen

bug in summarize: solverConfigCombo not well defined.

summarize does not always recognize that two data sets contribute to the same curve although they should.
That happens because 'solverConfigCombo' is created from a dictionary without applying some sorting.

I think
df['solverConfigCombo'] = df.apply(
lambda row: '/\n'.join(
['%s: %s' % (key, value) for (key, value) in sorted(row['solver_config'].items(), key=lambda x:x[0])]) +
"\ndevice:" + row['device'] + "\nmapping:" + '/\n'.join(
['%s: %s' % (key, value) for (key, value) in sorted(row['mapping_config'].items(), key=lambda x:x[0])]), axis=1)

should solve the issue for solverConfigCombo (note the 'sorted').

'applicationConfigCombo' probably has the same issue but I have not checked that.

With best regards, Jürgen

Option to raise exception when benchmark run fails

Currently, we will catch any exception in the run_benchmark function. However for some cases it might be useful to have an option that these exceptions are re-raised.

Docker Images not tagged with release version

Currently the docker images are not tagged with the release version (e.g. v2.0.1).

Find problem instance corresponding to some result.

When doing some evalution of QUARK results I sometimes need to know the problem instance to which an entry of the results.json corresponds.
My application stores the problem as "problem_<rep_count>.json" in the store directory provided by the benchmark manager. Unfortunately I do not see an easy way to reconstruct the store directory for some given entry from results.json.

An easy way to solve this problem would be to let the benchmark manager write 'idx_backlog' in the results file.

Seperated Post-Processing

QUARK should be able to combine results from several different runs in a single plot, using the results.csv as input

Module and class naming convention uncompliant with PEP8

Names of modules, classes, functions and variables should be compliant with Python standard PEP: https://pymbook.readthedocs.io/en/latest/pep8.html

unintuitive order of the questions at the start of a benchmark run

How it is currently:

Which application? and how to build a problem instance? (->SAT, TSP, PVC,...)
Which mapping do you want? (-> QUBO/Ising/...)
Which solving method do you want to use and how do you want to configure it? (->Annealer, QAOA, Classical, ...)

Intuitively you would ask the questions in the following order: 1->3->2 because you want to know with which method you want to solve a problem before you think about the mapping

regarding pylint R1728 consider-using-generator

(from #55)

src/modules/applications/optimization/PVC/PVC.py, method validate(...):
Is there a reason why you do # pylint: disable=R1728 (consider-using-generator), instead of using the generator as apparently suggested by pylint?
And what is the reason to use list(set(list([...]))) in the first place?

# pylint: disable=R1728 is used again in src/modules/solvers/GreedyClassicalPVC.py and in src/modules/solvers/ReverseGreedyClassicalPVC.py (both times in the method run(...)). The same question applies here.

Regarding # pylint: disable=R1728: Yes it would be nicer to move these lines to generators.

In the last Open Call, Marvin told me that he'd like to discuss this further to make sure we don't screw anything up.
Mostly the point is about the list(set(list([...]))), which uses a generator to create a list ([...]), only to wrap it in an explicit list(...), followed by a set(...) and yet another list(...). The full line is:

visited_seams = list(set(list([seam[0][0] for seam in solution if seam is not None])))  # pylint: disable=R1728

As far as I can see, seam contains the values ((seam, node), config, tool) (maybe we should document that better btw), meaning seam[0][0] gets the visited seam of the PVC process.
Following that line of code, only the length of visited_seams is used to determine whether all seams got visited.
So, the purpose of this line is to extract all unique visited seams to then count how many there are.

Since you can also get the length of a set, the following line is much shorter, but equivalent in result:

visited_seams = {seam[0][0] for seam in solution if seam is not None}

({...} creates a set.)

Is there anything I overlooked? If not, I'll create the PR in the next days.

git_uncommitted_changes returned with string type hint, used where bool is expected

As we discussed in the last Open Call, here is the issue for the people that weren't there.

In utils.get_git_revision(...), git_uncommitted_changes is assigned a bool, unless something goes wrong. In that case, it is assigned "unknown". As a result of this, the method is typed to return two strings (first git_revision_number, then git_uncommitted_changes). git_uncommitted_changes is (as far as I can tell) only used in saving a benchmark run.

However, under BenchmarkManager.run_benchmark(...), it is used as a parameter of a new BenchmarkRecord object. The definition of the initializer of BenchmarkRecord expects a bool, not a str. Therefore, PyCharm gives me a 'wrong type' warning.

Should we change the type hint in BenchmarkRecord to a string or do something else?

Since we had some uncertainties about Python's dynamic typing, here's a short presentation of how it works. If a variable is assigned a bool, even if its type hint says str, is still is a bool and treated as such.
The type hints are exactly what they're called: type hints.

>>> def foo() -> str:
...     return True
...
>>> test = foo()
>>> type(test)
<class 'bool'>
>>> test
True
>>> test2: str = "test2"
>>> test2
'test2'
>>> type(test2)
<class 'str'>
>>> test2 = foo()
>>> test2
True
>>> type(test2)
<class 'bool'>

username=os.getlogin() fails on certain machines

username=os.getlogin() in src/config.py fails for certain machines.

device_name in QUARK2

The constructor of Device requires 'device_name' as argument. The other components have no required constructor argument. This is so in QUARK2 still. This sometimes causes trouble because a Device must be handeled different than the other components.

I would like to have a consistent behaviour in QUARK2. Either to have a 'name' argument in Core so that every QUARK module has a name or to remove the device_name from Device.
I would prefer the first option - it seems natural to me that each component has a name. In this case the name should be the name given in the QUARK modules configuration.

References in Read The Docs are not shown in correct manner

References in Reads The Docs (https://quark-framework.readthedocs.io/en/latest/reference.html) are not shown in correct manner, a lot of references (for example for the TSP class) are missing.

In local builds everything is fine, hinting towards a misconfiguration of the build settings on the server side.