nicolas-kuechler / doe-suite Goto Github PK
View Code? Open in Web Editor NEWA tool for remote experiment management
Home Page: https://nicolas-kuechler.github.io/doe-suite/
License: Apache License 2.0
A tool for remote experiment management
Home Page: https://nicolas-kuechler.github.io/doe-suite/
License: Apache License 2.0
In many cases, we use a set_fact
, loop
, combine
or union
pattern to set variables.
The pattern is complicated to read, potentially inefficient, and can lead to errors if we extend the resulting list based on a default([])
if the same task is called twice. (we would have to reset the variable first in a separate set_fact
task.
See if we can achieve the same, with a general filter plugin.
At the moment, the experiment.yml playbook needs to run in order to switch to a new job once the previous job is finished.
It would be nice to run the experiment.yml playbook on a small EC2 instance and have another playbook running locally to orchestrate the control machine (create the machine, start a new experiment, restart the experiment.yml playbook, and fetch results from the control machine to the local machine.
There are a few things that could be improved in the ETL results processing which may require a new version due to breaking changes.
These are things that should be considered:
etl-debug-design: install
@cd $(does_config_dir) && \
poetry run python -m debugpy --listen 5678 --wait-for-client $(PWD)/doespy/doespy/etl/etl.py --suite $(suite) --id $(id) --load_from_design
The repotemplate.py script contains a bug when upon first run not the default host types are created:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/hidde/PhD/does-backends/does_config/group_vars/all'
Steps to reproduce:
client
and server
.small
Currently, experiments run in parallel but jobs within an experiment run sequentially.
This can lead to a lot of code duplication in suite designs to leverage parallelization.
An experiment is copied and something that is a $FACTOR$
is treated as a constant in two separate experiments.
(note, for single instance experiments running on Euler, this limitation is less of a problem because jobs can be executed in parallel on Euler)
We could investigate whether it is possible to define on an experiment level: n_env
or something similar that defines how many times the environment defined in host_types
is duplicated.
Options:
run_id % n_env == x
or something. I think such a choice may be better suited because splitting experiments and preserving run_id could be complex. add support for factors in lists lst: [{"a": 1, "b": $FACTOR$ }, {"a": 2, "b": $FACTOR$}]
(not implemented because the meaning is not clear how we would specify the factor. However, being able to reference other vars in the run might already solve the requirement.)
check whether it is possible to self reference variables form same design a: 12, b: {{ self.a }}
add support for using a: "abc $FACTOR$ cdef"
(not implemented: can use reference of other variables in the run to solve this)
add support for loading variables from another file into base_experiment. E.g., load all variables from this file in base_experiment (also requirement from Hidde)
maybe it would also be great to be able to load factor_levels
from another file (in particular when they are shared) between experiments.
(not implemented: not sure if actually a requirement)
it would be nice to use variable priority (basically I can load a file with variables but then I can overwrite them in the experiment in base_experiment
and make them a $FACTOR$
for example.
we should also be able to load a long command from somewhere: maybe we could introduce in the suite a section with suite-wide variables, or we follow more the ansible approach of allowing to specific a group_vars
directory under design
We could integrate a monitoring tool like glances to collect performance metrics for all hosts and potentially automatically visualize them in a dashboard.
The Euler cluster migrates from the LSF Batch system to Euler.
As a result, we need to update the tasks interacting with the batch scheduling system.
https://scicomp.ethz.ch/wiki/LSF_to_Slurm_quick_reference
squeue
command (replacement for jobs) has a --json
option for controlling the output format:
see https://slurm.schedmd.com/squeue.html
--json
Dump job information as JSON. All other formatting and filtering arguments will be ignored.
Similar to how variables can be included in the experiment design (with $INCLUDE_VARS$
), it would be convenient if this also worked within the config of an individual ETL step.
This would be an extension / re-design of the existing $ETL_VARS$
which currently are not flexible enough because everything that needs to be varied between steps must be defined as a variable. Instead, it would be more expressive to follow the $INCLUDE_VARS$
approach, which is flexible in the location where variables are included (e.g., under this variable) and also allows to define custom additional variables.
For example, this would allow defining a color scheme in a central location or in a step with a complex configuration, e.g., ColumnCrossPlotLoader
, to share a lot of the configuration with another use of the loader.
Example:
ColumnCrossPlotLoader:
metrics:
time: {value_cols: [base_s_mean], unit_label: sec}
cum_subplot_config:
- chart:
# Maybe support including vars from another (not sure whether that's a good idea)
$INCLUDE_VARS$: {suite: example07-etl, pipeline: coord_square, step: ColumnCrossPlotLoader, key: [cum_subplot_config, 0, chart] }
tick_params: {axis: x, labelsize: 7}
label_map:
custom_label: custom
# from a yaml file in another location (here it could be useful if we could define which variables from labels.yml we want, e.g., the label_map from labels
$INCLUDE_VARS$: labels.yml
Currently, if there are fatal errors on some hosts but not on others, the playbook will keep running, and depending on how much output there is, the user might miss that some host had fatal errors. Proposed solution: If one host fails, all should fail by default. But is should be possible to turn off this behavior in a config file.
Can we create the "local" cloud using Ansible to create Docker containers running an Ubuntu to run everything locally?
Subfolder imports in does_config/etl
are not working.
Relative imports are a challenge to fix, e.g.:
dir/
a.py
b.py
outer.py
And in outer.py
:
import dir.a
and in a.py
:
import b
Then the second import fails.
For single instance experiments, it might be interesting to be able to schedule a series of jobs where a new job starts after the previous job completes.
Instead of the ansible controller, we could have a make run
option that outsources the progress tracking to a controller on AWS and then first creates this and runs the doe-suite from there.
Ultimately, we can use another make target to get updates to the local machine.
Often we want to download some additional resources from S3.
The problem is that the required Ansible steps are not simple.
Hence, it would be good if there is a nice solution available that can be reused in projects.
Should work both on Euler (into scratch?) and AWS.
Script should ssh into a specific instance of an experiment and follow standard output/error of an experiment
Steps:
Each EC2 instance receives tags, based on these tags, the dynamic inventory plugin builds fine-grained host groups.
prj_id
(project id)suite
(name of the suite)exp_name
(experiment name)host_type
(e.g., client, server, etc.)controller
(if the instance is a controller instance)loading_separator: no
and default prefix ""
to avoid tag_Name
prefixOutsource Ansible master variables (EC2 instance specification, repos) to does_config
, such that they can be changed in a local setup.
It would be nice if we could run single-instance experiments without changing the design file also on Leonhard.
Idea:
At the moment, EC2 instances are terminated at the end of the suite.
The problem is that when experiments in the same suite do not take all a similar time, then we have machines idling until the last experiment is over.
For a single experiment on a single instance, it is sometimes required to run two processes, e.g., an additional one for logging.
Option 1:
We could extend the $CMD$
field to contain a list of commands per host_type
.
This would require that we are somehow able to distinguish which process determines that an experiment is over.
(could use first or any)
Option 2:
An alternative solution could be to start a bash script that runs the two processes.
For Pull Requests, we should be able to run example designs and verify that they work as expected.
Resources:
For systems that already provide a job queue (e.g., Leonhard), the experiment suite should submit all experiment jobs at once and then repeatedly check whether a job is finished.
Add option in ETL to apply pipeline to all experiments using a wildcard.
Syntax:
$ETL$:
pipeline:
experiments: *
... etc
In case a few jobs in a large experiment fail, it would be very convenient to be able to define a list of job_ids that need to be re-run.
We could be running more repetitions of these jobs, but the results would then directly go into the result structure.
The old result could still be stored along side the additional repetition.
Currently, we don't have a check to see whether ssh configuration is done correctly (forward agent configured, can connect to a host with the private key, etc.).
Maybe beforehand, we could check whether base ssh config is set with something like this: https://stackoverflow.com/a/38305248
Later after setting up the environment, we should have some checks with a good fail message.
Should work for euler and aws.
At the moment, it is only possible to run a single experiment.
However, it would be useful if it's possible to run a set of experiments sequentially.
The idea is that whenever an experiment job is complete and we fetched the result files, then we automatically process the results in an ETL style pipeline to visualize results or generate a summary. (playbook also runs ETL pipeline)
The DOE suite provides a few default implementations for Extractors, Transformers, and Loaders. However, within doe-config
a project can add its own implementation. Finally, a simple config file should control which extractors are used, then what chain of transformations is applied, and finally what we do with the results.
Extract a pandas data frame from DoE suite results folder structure.
Configured with a list of pairs that contain a regex pattern and an extractor extension to read files in a particular format (e.g., YamlExtractor
).
The suite already provides a set of extractors but a project can provide its own implementations in the doe-config
folder.
An extractor gets as input a path to a file that matches the provided regex and a dictionary with the current configuration and outputs a pandas data frame.
Outputs of individual extractors are concatenated (merged) into a single data frame.
def extract(path, config) -> df
In the transform phase, the data frame can be processed with a chain of transformations.
For example, calculate mean and standard deviation over the repetitions.
A transformer takes as input a pandas data frame and outputs also a data frame.
Additional options can configure how the data is transformed (e.g., a column name that contains the measurement).
def transform(df, options) -> df
Finally, a loader takes the transformed data frame and produces a result.
For example, store a summary in the results folder, generate a particular plot, store the results in a database, send a notification on slack, etc.
A config file in doe-config
specifies which loaders are applied to the data frame (should execute each loader).
A loader also takes as input a data frame and some options.
def load(df, options)
doe-config
? (should it be part of the design or separate per suite or per project level)doe-suite
repo and hence we need a way to import them.For convenience, it could be interesting to combine all results from loaders into a single pdf report.
Look at all the output directories of loaders. In these folders combine all .pdf
and/or .png
files into a single pdf or HTML report.
Could use mdutils to programmatically create a markdown file consisting of all figures + tables + config + links to folder
One thing to consider for the future is whether it would become easier to read and less error-prone to extract "loading" design files into a custom module or filter:
$CMD$
There are quite a lot of tasks in experiment-parse-config
and experiment-state
that could be potentially reduced to a single task.
Originally posted by @nicolas-kuechler in #10 (comment)
Currently, commands on Euler are scheduled individually per job, which may result in many (potentially short-running) jobs.
In this situation, SLURM advocates for using job arrays because it reduces the load on the scheduler.
We could modify the SLURM task submission component in the doe-suite to group all jobs of an experiment into a single large job array submission. This requires modifying the roles suite-scheduler-enqueue
, suite-scheduler-remove
, and suite-scheduler-status
to change the interaction with SLURM.
The range operator is evaluated as "range(0, 10)" instead of the actual list of values.
Steps:
"{{ range(10) }}"
Expected result: The parameter has the list from 0 to 9 as values.
Actual result: The parameter equals "range(0, 10)"
Using tsp
before enqueuing new jobs we ensure that old jobs (from previous experiments) are removed.
I think we should have the same behavior on Euler.
On Euler, it's a bit more challenging because multiple experiments share the batch job system.
However, I would propose that before submitting new jobs to the queue for an experiment, we remove all the jobs that are from the same suite and experiment.
The problem I can see is that you schedule a large experiment and then you realize that something is wrong in the design and you want to restart the suite with the same experiments. The jobs of the previous suite are still in the system and you have to wait until they are complete before the new jobs can run.
Could an experiment define a set of environments (host types), and then a variable in the experiment controls which one is selected?
For example, for Euler, it would be good to be able to set the queue based on individual runs instead of on the design granularity.
Maybe the same functionality could also be achieved by controlling more of the Euler job submission with variables, but this raises the question of how to transfer a design to AWS.
Two asserts prevent having different experiments with different host types.
Afterward, the tag_assignment fails because there is only a single euler_host_type
- name: Pick single host type
set_fact:
euler_host_type: "{{ host_types | dict2items | first | json_query('key') }}"
There seems to be an underlying implementation issue preventing support for this feature but not sure.
We should investigate this because it would be great to support this.
The doe-suite allows to quickly create many AWS resources that create a high cost.
Unfortunately, AWS does not allow setting an enforced cost limit.
However, there are potential options to reduce the risk:
Add support for multiple config repos to the Ansible master. This includes commands to maintain a list of repositories and short names for them, and an argument to specify the short name whenever the benchmark is run. The Ansible master should then switch to the respective repository and execute the experiments there.
Think about how we could simplify this setup. E.g., we could bind channels to repos:
Channel A -> repo1
Channel B -> repo2
and then always execute configs from repo1
if the command was sent from Channel A
(so that the user does not always need to specify the repo).
Similar to the factor-levels list, we could have a list for "except" combinations of factor-levels that should be filtered out from the run.
For example, assume we have factors a, b, c
if we say:
except_levels:
a: hello
c: world
Then all runs where a==hello
and c==world
are filtered out (independent of b
)
For artifact evaluations, we should provide more features for reproducibility:
suite_design
in a results folder rather than on the designThe sed
command used in the Makefile to run the tests might expect different arguments on the default Mac implementation.
To do: Check sed
command usage in Makefile, in the following targets
test-%
Document the following:
host_type
Low priority
When I want to re-run some (especially long-running) experiments in a suite, I now go to the suite, comment out the experiments that I want to skip and then run the suite. This adds some ambiguity as some might stay commented out.
It would help a lot if at invocation of the suite, I can, in addition to suite
and run
, I can also add an optional parameter skip_exps
(list
) that will essentially pretend that those experiments were commented out when creating the suite_config.yml
for that run. It probably only requires a check at the creation of a new suite run.
Note: When re-running a suite (i.e., id
is not new), this variable should be ignored
Relevant file: src/roles/suite-load-pre-aws/action_plugins/suite_design_validate.py
Currently, there is no setup to support (possibly conflicting) package requirements for the ETL pipeline of different projects (or experiment designs).
Especially for the Ansible master, it would be nice if users could specify some virtual environment or requirements.txt file with package dependencies necessary for their ETL pipeline.
As part of 6130c3a, I've introduced custom syntax for defining ranges in $FACTOR$
.
It turns out that this feature is not necessary because the existing jinja2 logic can already be used
value:
$FACTOR$: "{{ range(10) }}"
range(...)
logicFor a single suite or even a single experiment super etl pipeline, the ability to "include" an etl pipeline from the etl design instead of copying pipelines around.
Often, pipelines will be quite similar, with their etl's requiring little parameter-specific configuration (made easier as well using the
Choices to make:
$ETL$:
$INCLUDE_PIPELINE$:
suite: example02-single
pipeline: pipeline_name
For etl and super etl it might make sense to have some more examples that are meaningful together.
Not sure what would be a great use "case study" but maybe even having an example design dedicated to introducing etl things might be an option. A suite with multiple experiments, showing the possibilities of different loaders + transformers that are provided by default. (e.g., FactorAggTransformer).
Then we could also use this etl experiment for the super etl config and maybe combine it with some other existing suite, e.g., example02-single.
Improve the "getting started" process:
scripts/repotemplate.py
, there should be a guided process for initializing a custom does_config
folder + later adding new host types.xyz/does_config/group_vars/all
if we call the repotemplate script before the folder structure exists) key name should be env variable (and not in group_vars/all/main/exp_base
)
prj_id should be an env variable (and not in `group_vars/all/main)
when running etl locally, should also be able to say id=last
when running etl locally, we should be able to say that etl config of current design should be used instead of the old one present at the time when the experiment was initially run.
maybe etl should provide a load cached
flag which stores the output of the transformer stage and loads it from there and executes all loaders.
when running the experiment suite playbook, we should be able to set a filter for running a subset of the experiments of a suite.
Use json-schema to define a schema for designs along with documentation for writing designs.
I think it won't be possible to represent the $FACTOR$
logic and the referencing of other variables but the simpler logic should be possible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.