Add support for `trs` CURIE

Workflow Execution Service backend

WfExS (which could be pronounced like "why-fex", "why-fix" or "why-fixes") project aims to automate next steps:

Fetch and cache a workflow from either:
- A TRSv2-enabled WorkflowHub instance (which provides RO-Crates).
- A TRSv2 (2.0.0-beta2 or 2.0.0) enabled service. Currently tested with Dockstore.
- A straight URL to an existing RO-Crate in ZIP archive describing a workflow.
- A git repository (using this syntax for the URI)
- A public GitHub URL (like this example).
Identify the kind of workflow.
Fetch and set up workflow execution engine (currently supported Nextflow and cwltool).
Identify the needed containers by the workflow, and fetch/cache them. Depending on the local setup, singularity, apptainer, docker, podman or none of them will be used.
Fetch and cache the inputs, represented either through an URL or a CURIE-represented PID (public persistent identifier).
Execute the workflow in a secure way, if it was requested.
Optionally describe the results through an RO-Crate, and upload both RO-Crate and the results elsewhere in a secure way.

Relevant docs:

INSTALL.md: In order to use WfExS-backend you have to install first at least core dependencies described there.
TODO.md: This development is relevant for projects like EOSC-Life or EJP-RD. The list of high level scheduled and pending developments can be seen at .
README_LIFECYCLE.md: WfExS-backend analysis lifecycle and usage scenarios are briefly described with flowcharts there.
README_REPLICATOR.md: It briefly describes WfExS-config-replicator.py usage.

Additional present and future documentation is hosted at development-docs subfolder, until it is migrated to a proper documentation service.

Cite as

José María Fernández, Laura Rodríguez-Navas, Adrián Muñoz-Cívico, Paula Iborra, Daniel Lea (2024):
WfExS-backend.
Zenodo
https://doi.org/10.5281/zenodo.6567591

Visit the Zenodo record for the latest versioned DOI and author list.

Presentations and outreach

Paula Iborra, José M. Fernández, Salvador Capella-Gutierrez (2024):
Onboarding Snakemake: Progress towards WfExS-backend integration.
F1000Research 13(ELIXIR):551 (poster)
https://doi.org/10.7490/f1000research.1119725.1

Eugenio Gonzalo1, Laia Codó, Jose María Fernandez, Stian Soiland-Reyes, Salvador Capella-Gutierrez, Emily Jefferson, Carole Goble, Tim Beck, Phil Quinlan, Tom Giles (2024):
Five safes workflow RO-Crate and WfExS. Closing the gap of federated analysis and Trusted Research Enviroments (TREs) in the health data context.
F1000Research 13(ELIXIR):550 (poster)
https://doi.org/10.7490/f1000research.1119724.1

José M. Fernández, Paula Iborra, Sébastien Moretti, Arun Isaac, Paul De Geest, Stian Soiland-Reyes (2024):
BioHackEU23: FAIR Workflow Execution with WfExS and Workflow Run Crate.
BioHackrXiv
https://doi.org/10.37044/osf.io/7f94w

Simone Leo, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo, Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske de Wit, Bruno de Paula Kinoshita, Stian Soiland-Reyes (2024):
Recording provenance of workflow runs with RO-Crate.
arXiv:2312.07852
https://doi.org/10.48550/arXiv.2312.07852

José M. Fernández1, Laura Rodriguez-Navas, Salvador Capella-Gutiérrez (2023):
WfExS-backend in the WRROC world?
F1000Research 12(ELIXIR):616 (poster) https://doi.org/10.7490/f1000research.1119457.1

Fernández JM, Rodríguez-Navas L and Capella-Gutiérrez S.
Secured and annotated execution of workflows with WfExS-backend [version 1; not peer reviewed].
F1000Research 2022, 11:1318 (poster)
https://doi.org/10.7490/f1000research.1119198.1

Laura Rodríguez-Navas (2021):
WfExS: a software component to enable the use of RO-Crate in the EOSC-Life collaboratory.
FAIR Digital Object Forum, CWFR & FDO SEM meeting, 2021-07-02 [video recording], [slides]

Laura Rodríguez-Navas (2021):
WfExS: a software component to enable the use of RO-Crate in the EOSC-Life tools collaboratory.
EOSC Symposium 2021, 2021-06-17 [video recording] [slides]

Salvador Capella-Gutierrez (2021):
Demonstrator 7: Accessing human sensitive data from analytical workflows available to everyone in EOSC-Life
Populating EOSC-Life: Success stories from the demonstrators, 2021-01-19. https://www.eosc-life.eu/d7/ [video] [slides]

Bietrix, Florence; Carazo, José Maria; Capella-Gutierrez, Salvador; Coppens, Frederik; Chiusano, Maria Luisa; David, Romain; Fernandez, Jose Maria; Fratelli, Maddalena; Heriche, Jean-Karim; Goble, Carole; Gribbon, Philip; Holub, Petr; P. Joosten, Robbie; Leo, Simone; Owen, Stuart; Parkinson, Helen; Pieruschka, Roland; Pireddu, Luca; Porcu, Luca; Raess, Michael; Rodriguez- Navas, Laura; Scherer, Andreas; Soiland-Reyes, Stian; Tang, Jing (2021):
EOSC-Life Methodology framework to enhance reproducibility within EOSC-Life.
EOSC-Life deliverable D8.1, Zenodo https://doi.org/10.5281/zenodo.4705078

Example RO-Crate outputs

https://doi.org/10.5281/zenodo.12588049 -- execution of WOMBAT-Pipelines Nextflow workflow
https://doi.org/10.5281/zenodo.12622362 -- execution of Wetlab2Variations CWL workflow

WfExS-backend Usage

An automatically generated description of the command line directives is available at the CLI section of the documentation.

Also, a description about the different WfExS commands is available at the command line section of the documentation.

Configuration files

The program uses three different types of configuration files:

Local configuration file: YAML formatted file which describes the local setup of the backend (example at workflow_examples/local_config.yaml). JSON Schema describing the format (and used for validation) is available at wfexs_backend/schemas/config.json and there is also automatically generated documentation (see config_schema.md). Relative paths in this configuration file use as reference the directory where the local configuration file is living.
- cacheDir: The path in this key sets up the place where all the contents which can be cached are hold. It contains downloaded RO-Crate, downloaded workflow git repositories, downloaded workflow engines. It is recommended to have it outside /tmp directory when Singularity is being used, due undesirable side interactions with the way workflow engines use Singularity.
- workDir: The path in this key sets up the place where all the executions are going to store both intermediate and final results, having a separate directory for each execution. It is recommended to have it outside /tmp directory when Singularity is being used, due undesirable side interactions with the way workflow engines use Singularity.
- crypt4gh.key: The path to the secret key used in this installation. It is paired to crypt4gh.pub.
- crypt4gh.pub: The path to the public key used in this installation. It is paired to crypt4gh.key.
- crypt4gh.passphrase: The passphrase needed to decrypt the contents of crypt4gh.key.
- tools.engineMode: Currently, local mode only.
- tools.containerType: Currently, singularity, docker or podman.
- tools.gitCommand: Path to git command (only used when needed)
- tools.dockerCommand: Path to docker command (only used when needed)
- tools.singularityCommand: Path to singularity command (only used when needed)
- tools.podmanCommand: Path to podman command (only used when needed)
- tools.javaCommand: Path to java command (only used when needed)
- tools.encrypted_fs.type: Kind of FUSE encryption filesystem to use for secure working directories. Currently, both gocryptfs and encfs are supported.
- tools.encrypted_fs.command: Command path to be used to mount the secure working directory. The default depends on value of tools.encrypted_fs.type.
- tools.encrypted_fs.fusermount_command: Command to be used to unmount the secure working directory. Defaults to fusermount.
- tools.encrypted_fs.idle: Number of minutes of inactivity before the encrypted FUSE filesystem is automatically unmounted. The default is 5 minutes.
Workflow configuration file: YAML formatted file which describes the workflow staging before being executed, like where inputs are located and can be fetched, the security contexts to be used on specific inputs to get those controlled access resources, the parameters, the outputs to capture, ... (Nextflow example, CWL example). JSON Schema describing the format and valid keys (and used for validation), is available at wfexs_backend/schemas/stage-definition.json and there is also automatically generated documentation (see stage-definition_schema.md).
Security contexts file: YAML formatted file which holds the user/password pairs, security tokens or keys needed on different steps, like input fetching. (Nextflow example, CWL example). JSON Schema describing the format and valid keys (and used for validation), is available at wfexs_backend/schemas/security-context.json and there is also automatically generated documentation (see security-context_schema.md).

License

Licensed under the Apache License, version 2.0 https://www.apache.org/licenses/LICENSE-2.0, see the file LICENSE for details.

	def createYAMLFile(self, matInputs, cwlInputs, filename):
	"""
	Method to create a YAML file that describes the execution inputs of the workflow
	needed for their execution. Return parsed inputs.
	"""
	try:
	execInputs = self.executionInputs(matInputs, cwlInputs)
	if len(execInputs) != 0:
	with open(filename, mode="w+", encoding="utf-8") as yaml_file:
	yaml.dump(execInputs, yaml_file, allow_unicode=True, default_flow_style=False, sort_keys=False)
	return execInputs

	else:
	raise WorkflowEngineException(
	"Dict of execution inputs is empty")

	except IOError as error:
	raise WorkflowEngineException(
	"ERROR: cannot create YAML file {}, {}".format(filename, error))

	def executionInputs(self, matInputs: List[MaterializedInput], cwlInputs):
	"""
	Setting execution inputs needed to execute the workflow
	"""
	if len(matInputs) == 0: # Is list of materialized inputs empty?
	raise WorkflowEngineException("FATAL ERROR: Execution with no inputs")

	if len(cwlInputs) == 0: # Is list of declared inputs empty?
	raise WorkflowEngineException("FATAL ERROR: Workflow with no declared inputs")

	execInputs = dict()
	for matInput in matInputs:
	if isinstance(matInput, MaterializedInput): # input is a MaterializedInput
	# numberOfInputs = len(matInput.values) # number of inputs inside a MaterializedInput
	for input_value in matInput.values:
	name = matInput.name
	value_type = cwlInputs.get(name, {}).get('type')
	if value_type is None:
	raise WorkflowEngineException("ERROR: input {} not available in workflow".format(name))

	value = input_value
	if isinstance(value, MaterializedContent): # value of an input contains MaterializedContent
	if value.kind in (ContentKind.Directory, ContentKind.File):
	if not os.path.exists(value.local):
	self.logger.warning("Input {} is not materialized".format(name))
	value_local = value.local
	if isinstance(value_type, dict): # MaterializedContent is a List of File
	classType = value_type['items']
	execInputs.setdefault(name, []).append({"class": classType, "location": value_local})
	else: # MaterializedContent is a File
	classType = value_type
	execInputs[name] = {"class": classType, "location": value_local}
	else:
	raise WorkflowEngineException(
	"ERROR: Input {} has values of type {} this code does not know how to handle".format(name, value.kind))
	else:
	execInputs[name] = value

	return execInputs

inab / wfexs-backend Goto Github PK

wfexs-backend's Introduction

Workflow Execution Service backend

Relevant docs:

Cite as

Presentations and outreach

Example RO-Crate outputs

WfExS-backend Usage

Configuration files

License

wfexs-backend's People

Contributors

Stargazers

Watchers

Forkers

wfexs-backend's Issues

Description

Traceback

Settings

Stage file

Local config

Description

Local config file

Stage file

Background

Proposed Feature

Description

Description

Recommend Projects

Recommend Topics

Recommend Org