nicolas-kuechler / doe-suite Goto Github PK

View Code? Open in Web Editor NEW

7.0 4.0 0.0 1.59 MB

A tool for remote experiment management

Home Page: https://nicolas-kuechler.github.io/doe-suite/

License: Apache License 2.0

Jinja 0.73% Python 72.88% Makefile 3.54% HTML 22.86%

design-of-experiments experiment-management

doe-suite's People

Contributors

Stargazers

Watchers

doe-suite's Issues

Replace set_fact, loop, combine pattern with filter plugin

In many cases, we use a set_fact, loop, combine or union pattern to set variables.

The pattern is complicated to read, potentially inefficient, and can lead to errors if we extend the resulting list based on a default([]) if the same task is called twice. (we would have to reset the variable first in a separate set_fact task.

See if we can achieve the same, with a general filter plugin.

Add support for running the experiment.yml playbook on a control instance

At the moment, the experiment.yml playbook needs to run in order to switch to a new job once the previous job is finished.

It would be nice to run the experiment.yml playbook on a small EC2 instance and have another playbook running locally to orchestrate the control machine (create the machine, start a new experiment, restart the experiment.yml playbook, and fetch results from the control machine to the local machine.

ETL V2

There are a few things that could be improved in the ETL results processing which may require a new version due to breaking changes.

These are things that should be considered:

Extractor performance is pretty bad
At the moment, ETL pipelines build a single data frame even when there are multiple result files involved.

ETL debugging could use more native support:

etl-debug-design: install
    @cd $(does_config_dir) && \
    poetry run python -m debugpy --listen 5678 --wait-for-client $(PWD)/doespy/doespy/etl/etl.py --suite $(suite) --id $(id) --load_from_design

RepoTemplate script directory creation

The repotemplate.py script contains a bug when upon first run not the default host types are created:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/hidde/PhD/does-backends/does_config/group_vars/all'

Steps to reproduce:

Create new project
Select that you dont want to create default client and server.
Enter a custom host type, e.g., small

Find Alternative to Template Repo

should allow to add changes added to the template repo in the main repo
but "template" repo should be protected from accidental commits

Improve options for running jobs in parallel

Currently, experiments run in parallel but jobs within an experiment run sequentially.

This can lead to a lot of code duplication in suite designs to leverage parallelization.
An experiment is copied and something that is a $FACTOR$ is treated as a constant in two separate experiments.

(note, for single instance experiments running on Euler, this limitation is less of a problem because jobs can be executed in parallel on Euler)

We could investigate whether it is possible to define on an experiment level: n_env or something similar that defines how many times the environment defined in host_types is duplicated.

Options:

do a preprocessing step on the design to split the runs into separate experiments and then do not change anything in the execution (would have to ensure that the run_id is preserved, otherwise the results will overlap
have some functionality in the execution that checks that we only schedule runs with run_id % n_env == x or something. I think such a choice may be better suited because splitting experiments and preserving run_id could be complex.

Extending the Experiment Design

add support for factors in lists lst: [{"a": 1, "b": $FACTOR$ }, {"a": 2, "b": $FACTOR$}]
(not implemented because the meaning is not clear how we would specify the factor. However, being able to reference other vars in the run might already solve the requirement.)
check whether it is possible to self reference variables form same design a: 12, b: {{ self.a }}
add support for using $FACTOR$ within string: a: "abc $FACTOR$ cdef"
(not implemented: can use reference of other variables in the run to solve this)
add support for loading variables from another file into base_experiment. E.g., load all variables from this file in base_experiment (also requirement from Hidde)
maybe it would also be great to be able to load factor_levels from another file (in particular when they are shared) between experiments.
(not implemented: not sure if actually a requirement)
it would be nice to use variable priority (basically I can load a file with variables but then I can overwrite them in the experiment in base_experiment and make them a $FACTOR$ for example.
we should also be able to load a long command from somewhere: maybe we could introduce in the suite a section with suite-wide variables, or we follow more the ansible approach of allowing to specific a group_vars directory under design

Add support for monitoring tool

We could integrate a monitoring tool like glances to collect performance metrics for all hosts and potentially automatically visualize them in a dashboard.

Euler Migration from LSF to SLURM

The Euler cluster migrates from the LSF Batch system to Euler.
As a result, we need to update the tasks interacting with the batch scheduling system.

bsub
bjobs
bkill?

https://scicomp.ethz.ch/wiki/LSF_to_Slurm_quick_reference

squeue command (replacement for jobs) has a --json option for controlling the output format:

see https://slurm.schedmd.com/squeue.html

--json
Dump job information as JSON. All other formatting and filtering arguments will be ignored.

Add include variables functionality to ETL steps.

Similar to how variables can be included in the experiment design (with $INCLUDE_VARS$ ), it would be convenient if this also worked within the config of an individual ETL step.
This would be an extension / re-design of the existing $ETL_VARS$ which currently are not flexible enough because everything that needs to be varied between steps must be defined as a variable. Instead, it would be more expressive to follow the $INCLUDE_VARS$ approach, which is flexible in the location where variables are included (e.g., under this variable) and also allows to define custom additional variables.

For example, this would allow defining a color scheme in a central location or in a step with a complex configuration, e.g., ColumnCrossPlotLoader, to share a lot of the configuration with another use of the loader.

Example:

ColumnCrossPlotLoader:
        metrics:
          time: {value_cols: [base_s_mean], unit_label: sec}

        cum_subplot_config:
        - chart:
            # Maybe support including vars from another (not sure whether that's a good idea)
            $INCLUDE_VARS$: {suite: example07-etl, pipeline: coord_square, step: ColumnCrossPlotLoader, key: [cum_subplot_config, 0, chart] }
            
            tick_params: {axis: x, labelsize: 7}


          label_map:
             custom_label: custom
              # from a yaml file in another location  (here it could be useful if we could define which variables from labels.yml we want, e.g., the label_map from labels
             $INCLUDE_VARS$: labels.yml

Error handling if only some hosts fail

Currently, if there are fatal errors on some hosts but not on others, the playbook will keep running, and depending on how much output there is, the user might miss that some host had fatal errors. Proposed solution: If one host fails, all should fail by default. But is should be possible to turn off this behavior in a config file.

Create "Local" Cloud

Can we create the "local" cloud using Ansible to create Docker containers running an Ubuntu to run everything locally?

There is an ansible module to create a docker container: https://techviewleo.com/manage-docker-containers-with-ansible/

Subfolder imports in `does_config/etl`

Subfolder imports in does_config/etl are not working.

Relative imports are a challenge to fix, e.g.:

dir/
   a.py
   b.py
outer.py

And in outer.py:

import dir.a

and in a.py:

import b

Then the second import fails.

Schedule multiple sequential jobs at once for single instance experiments

For single instance experiments, it might be interesting to be able to schedule a series of jobs where a new job starts after the previous job completes.

https://askubuntu.com/a/1072463

Replacement of Ansible Controller

Instead of the ansible controller, we could have a make run option that outsources the progress tracking to a controller on AWS and then first creates this and runs the doe-suite from there.

Ultimately, we can use another make target to get updates to the local machine.

S3 Download Utility

Often we want to download some additional resources from S3.
The problem is that the required Ansible steps are not simple.

Hence, it would be good if there is a nice solution available that can be reused in projects.

Should work both on Euler (into scratch?) and AWS.

Add convenience script to follow progress of experiments

Script should ssh into a specific instance of an experiment and follow standard output/error of an experiment

Steps:

Doe should output a helper config json with the hostname of the currently running experiment.
Helper script should read in this config file, as well as the current suite and id to read the suite state.
Helper script should ssh into the relevant machine and tail the relevant file (possibly user-specified)

Extend Usage of EC2 Instance Tags for Inventory

Each EC2 instance receives tags, based on these tags, the dynamic inventory plugin builds fine-grained host groups.

Notes

set loading_separator: no and default prefix "" to avoid tag_Name prefix
patterns to address host groups: ansible-patterns

Ansible Master Config

Outsource Ansible master variables (EC2 instance specification, repos) to does_config, such that they can be changed in a local setup.

Enable basic support for multiple backends

It would be nice if we could run single-instance experiments without changing the design file also on Leonhard.

Idea:

we could use different host files (i.e., group_vars) for different backends. In essence, when we start the playbook we can choose a backend. Afterward, it will look at the name of the host type from the design file and check that this host type exists in this backend and then run.
for Leonhard, it's a bit special because also the way to run a command and to see that it's done is different. Need to see how this could be integrated.

Terminating hosts per experiment instead of per suite (i.e., when all experiments are done)

At the moment, EC2 instances are terminated at the end of the suite.

The problem is that when experiments in the same suite do not take all a similar time, then we have machines idling until the last experiment is over.

Enable multi command experiments

For a single experiment on a single instance, it is sometimes required to run two processes, e.g., an additional one for logging.

Option 1:
We could extend the $CMD$ field to contain a list of commands per host_type.
This would require that we are somehow able to distinguish which process determines that an experiment is over.
(could use first or any)

Option 2:
An alternative solution could be to start a bash script that runs the two processes.

CI: Running Tests with Example Designs

For Pull Requests, we should be able to run example designs and verify that they work as expected.

Resources:

Add support for non-sequential system

For systems that already provide a job queue (e.g., Leonhard), the experiment suite should submit all experiment jobs at once and then repeatedly check whether a job is finished.

ETL: Experiments wildcard

Add option in ETL to apply pipeline to all experiments using a wildcard.

Syntax:

$ETL$:
   pipeline:
      experiments: *
      ... etc

Options for re-running a (failed) job.

In case a few jobs in a large experiment fail, it would be very convenient to be able to define a list of job_ids that need to be re-run.

We could be running more repetitions of these jobs, but the results would then directly go into the result structure.

The old result could still be stored along side the additional repetition.

Hint for user that ssh config is not correct

Currently, we don't have a check to see whether ssh configuration is done correctly (forward agent configured, can connect to a host with the private key, etc.).

Maybe beforehand, we could check whether base ssh config is set with something like this: https://stackoverflow.com/a/38305248

Later after setting up the environment, we should have some checks with a good fail message.

Should work for euler and aws.

Add support for running multiple experiments.

At the moment, it is only possible to run a single experiment.

However, it would be useful if it's possible to run a set of experiments sequentially.

Add support for ETL result processing

The idea is that whenever an experiment job is complete and we fetched the result files, then we automatically process the results in an ETL style pipeline to visualize results or generate a summary. (playbook also runs ETL pipeline)

The DOE suite provides a few default implementations for Extractors, Transformers, and Loaders. However, within doe-config a project can add its own implementation. Finally, a simple config file should control which extractors are used, then what chain of transformations is applied, and finally what we do with the results.

Extract

Extract a pandas data frame from DoE suite results folder structure.
Configured with a list of pairs that contain a regex pattern and an extractor extension to read files in a particular format (e.g., YamlExtractor).
The suite already provides a set of extractors but a project can provide its own implementations in the doe-config folder.

An extractor gets as input a path to a file that matches the provided regex and a dictionary with the current configuration and outputs a pandas data frame.
Outputs of individual extractors are concatenated (merged) into a single data frame.

def extract(path, config) -> df

Transform

In the transform phase, the data frame can be processed with a chain of transformations.
For example, calculate mean and standard deviation over the repetitions.

A transformer takes as input a pandas data frame and outputs also a data frame.
Additional options can configure how the data is transformed (e.g., a column name that contains the measurement).

def transform(df, options) -> df

Load

Finally, a loader takes the transformed data frame and produces a result.
For example, store a summary in the results folder, generate a particular plot, store the results in a database, send a notification on slack, etc.

A config file in doe-config specifies which loaders are applied to the data frame (should execute each loader).

A loader also takes as input a data frame and some options.

def load(df, options)

Open Questions

how to configure the ETL pipeline in doe-config? (should it be part of the design or separate per suite or per project level)
how to add additional extractors, transformers, and loaders within the project directory? The base classes to extend would be stored in the doe-suite repo and hence we need a way to import them.

ETL: Automatic Loader Report

For convenience, it could be interesting to combine all results from loaders into a single pdf report.

Look at all the output directories of loaders. In these folders combine all .pdf and/or .png files into a single pdf or HTML report.

Could use mdutils to programmatically create a markdown file consisting of all figures + tables + config + links to folder

Refactor loading experiment designs

One thing to consider for the future is whether it would become easier to read and less error-prone to extract "loading" design files into a custom module or filter:

read the design file
validate, potentially with a schema (uniqueness requirements, all fields defined, etc.)
fill default values
extend the base experiment with the factors
substituting $CMD$

There are quite a lot of tasks in experiment-parse-config and experiment-state that could be potentially reduced to a single task.

Originally posted by @nicolas-kuechler in #10 (comment)

Change SLURM scheduler interface to group jobs into job arrays.

Currently, commands on Euler are scheduled individually per job, which may result in many (potentially short-running) jobs.
In this situation, SLURM advocates for using job arrays because it reduces the load on the scheduler.

We could modify the SLURM task submission component in the doe-suite to group all jobs of an experiment into a single large job array submission. This requires modifying the roles suite-scheduler-enqueue, suite-scheduler-remove, and suite-scheduler-status to change the interaction with SLURM.

Euler Docs: https://scicomp.ethz.ch/wiki/Job_arrays
Slurm Docs: https://slurm.schedmd.com/job_array.html

Range operator not working in suite design files

The range operator is evaluated as "range(0, 10)" instead of the actual list of values.

Steps:

Create suite design with a parameter (non-factor) that is assigned a range: "{{ range(10) }}"
Run suite
Check config.json of experiment

Expected result: The parameter has the list from 0 to 9 as values.

Actual result: The parameter equals "range(0, 10)"

Euler: Removing a Job from the Queue

Using tsp before enqueuing new jobs we ensure that old jobs (from previous experiments) are removed.

I think we should have the same behavior on Euler.
On Euler, it's a bit more challenging because multiple experiments share the batch job system.

However, I would propose that before submitting new jobs to the queue for an experiment, we remove all the jobs that are from the same suite and experiment.

The problem I can see is that you schedule a large experiment and then you realize that something is wrong in the design and you want to restart the suite with the same experiments. The jobs of the previous suite are still in the system and you have to wait until they are complete before the new jobs can run.

Same Experiment Design on different Host Types

Could an experiment define a set of environments (host types), and then a variable in the experiment controls which one is selected?

For example, for Euler, it would be good to be able to set the queue based on individual runs instead of on the design granularity.

Maybe the same functionality could also be achieved by controlling more of the Euler job submission with variables, but this raises the question of how to transfer a design to AWS.

Euler cloud does not support multiple single-instance experiments with different host types.

Two asserts prevent having different experiments with different host types.
Afterward, the tag_assignment fails because there is only a single euler_host_type

- name: Pick single host type
  set_fact:
    euler_host_type: "{{ host_types | dict2items | first | json_query('key') }}"

There seems to be an underlying implementation issue preventing support for this feature but not sure.
We should investigate this because it would be great to support this.

Control AWS Spending

The doe-suite allows to quickly create many AWS resources that create a high cost.
Unfortunately, AWS does not allow setting an enforced cost limit.

However, there are potential options to reduce the risk:

before creating the resources, show the estimated cost of suite per hour
warning if more than x resources are running (automatically stop them if not confirmation within x time)
configure budget alarms or create alarms if resources are idling:

Support Multiple Config Repos

Add support for multiple config repos to the Ansible master. This includes commands to maintain a list of repositories and short names for them, and an argument to specify the short name whenever the benchmark is run. The Ansible master should then switch to the respective repository and execute the experiments there.

Think about how we could simplify this setup. E.g., we could bind channels to repos:

Channel A -> repo1
Channel B -> repo2

and then always execute configs from repo1 if the command was sent from Channel A (so that the user does not always need to specify the repo).

Suite Design Feature: Except

Similar to the factor-levels list, we could have a list for "except" combinations of factor-levels that should be filtered out from the run.
For example, assume we have factors a, b, c if we say:

except_levels:
   a: hello
   c: world

Then all runs where a==helloand c==world are filtered out (independent of b)

Reproducibility: Features for Artifact Evaluation

For artifact evaluations, we should provide more features for reproducibility:

run a suite based on suite_design in a results folder rather than on the design
store the commit hash or something like this next to the results (how to deal with local changes? maybe have a "reproducible" flag that enforces that you don't have changes?)
provide a way to run a meaningful subset of the results?
store info on which cloud the experiment was executed (introduce check that when calling last, that the cloud still matches)

Test suite compatibility on Mac

The sed command used in the Makefile to run the tests might expect different arguments on the default Mac implementation.

To do: Check sed command usage in Makefile, in the following targets

test-%
convert-to-expected

Ansible Master Documentation

Document the following:

General README
- Purpose/idea behind the ansible master
- How to setup (running ansible, credentials, slack tokens)
- How to add a slack bot
- Commands available on slack bot or script on ansible master

Enable multi-region experiments

define a region per host_type
create a vpc per region and use vpc peering to connect them

Add possibility to ignore experiments in suite

Low priority

When I want to re-run some (especially long-running) experiments in a suite, I now go to the suite, comment out the experiments that I want to skip and then run the suite. This adds some ambiguity as some might stay commented out.

It would help a lot if at invocation of the suite, I can, in addition to suite and run, I can also add an optional parameter skip_exps (list) that will essentially pretend that those experiments were commented out when creating the suite_config.yml for that run. It probably only requires a check at the creation of a new suite run.

Note: When re-running a suite (i.e., id is not new), this variable should be ignored

Relevant file: src/roles/suite-load-pre-aws/action_plugins/suite_design_validate.py

Special Python Requirements for ETL Pipeline

Currently, there is no setup to support (possibly conflicting) package requirements for the ETL pipeline of different projects (or experiment designs).

Especially for the Ansible master, it would be nice if users could specify some virtual environment or requirements.txt file with package dependencies necessary for their ETL pipeline.

Experiment Design: Remove Custom Range Syntax

As part of 6130c3a, I've introduced custom syntax for defining ranges in $FACTOR$ .

It turns out that this feature is not necessary because the existing jinja2 logic can already be used

value:
  $FACTOR$: "{{ range(10) }}"

update validation logic to allow the existing type
remove the custom range(...) logic
update example + doc (show in doc that we can also use the range syntax fo rnon-factors)

ETL: Include pipelines

For a single suite or even a single experiment super etl pipeline, the ability to "include" an etl pipeline from the etl design instead of copying pipelines around.

Often, pipelines will be quite similar, with their etl's requiring little parameter-specific configuration (made easier as well using the $FACTORS$ parameter option).

Choices to make:

How should includable pipelines be defined?
Either in separate "pipeline" files, but then we should find a place to define these. Or include them from other suite config files:

$ETL$:
  $INCLUDE_PIPELINE$: 
       suite: example02-single
       pipeline: pipeline_name

ETL: Meaningful example for Super ETL

For etl and super etl it might make sense to have some more examples that are meaningful together.

Not sure what would be a great use "case study" but maybe even having an example design dedicated to introducing etl things might be an option. A suite with multiple experiments, showing the possibilities of different loaders + transformers that are provided by default. (e.g., FactorAggTransformer).

Then we could also use this etl experiment for the super etl config and maybe combine it with some other existing suite, e.g., example02-single.

Collection of Smaller Features / Changes

Experiment Design: Define Schema

Use json-schema to define a schema for designs along with documentation for writing designs.

I think it won't be possible to represent the $FACTOR$ logic and the referencing of other variables but the simpler logic should be possible.

Could also set defaults in the schema.