Git Product home page Git Product logo

runcrate's Introduction

runcrate's People

Contributors

alaninmcr avatar dependabot[bot] avatar dgarijo avatar dnlbauer avatar lrodrin avatar mr-c avatar simleo avatar stain avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

runcrate's Issues

AttributeError: 'NoneType' object has no attribute 'rsplit'

Trying another workflow with newly transformed expressionTool to commandLineTool...
Unfortunately this zip file is 109MB. If you need a copy let me know how I can share this with you.

/Volumes/Git/m-unlock/cwl/tests/PROV: sha256 manifest lists snapshot/array_to_file_tool.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha256 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha1 manifest lists snapshot/array_to_file_tool.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha1 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha512 manifest lists snapshot/array_to_file_tool.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha512 manifest lists snapshot/concatenate.cwl multiple times with the same value
Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
    sys.exit(cli())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 68, in convert
    crate = builder.build()
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 266, in build
    self.add_workflow(crate)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 322, in add_workflow
    self.add_step(crate, workflow, s)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 334, in add_step
    tool = self.add_tool(crate, workflow, cwl_step.run)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 380, in add_tool
    self.add_param_connections(crate, tool)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 623, in add_param_connections
    from_param = get_fragment(s)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 115, in get_fragment
    return uri.rsplit("#", 1)[-1]
AttributeError: 'NoneType' object has no attribute 'rsplit'

KeyError: <cwl_utils.parser.cwl_v1_2.CommandInputEnumSchema object at 0x10aa191e0> ?

After changing the expression steps to command line steps I have the following:

runcrate convert -o ROC PROV on the PROV folder attached

/Volumes/Git/m-unlock/cwl/tests/PROV: sha256 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha512 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha1 manifest lists snapshot/concatenate.cwl multiple times with the same value

Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
    sys.exit(cli())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 68, in convert
    crate = builder.build()
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 266, in build
    self.add_workflow(crate)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 322, in add_workflow
    self.add_step(crate, workflow, s)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 334, in add_step
    tool = self.add_tool(crate, workflow, cwl_step.run)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 373, in add_tool
    tool["input"] = self.add_params(crate, cwl_tool.inputs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 387, in add_params
    properties = properties_from_cwl_param(cwl_p)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 81, in properties_from_cwl_param
    additional_type = "Collection" if cwl_p.secondaryFiles else convert_cwl_type(cwl_p.type)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 65, in convert_cwl_type
    s = set(convert_cwl_type(_) for _ in cwl_type)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 65, in <genexpr>
    s = set(convert_cwl_type(_) for _ in cwl_type)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 73, in convert_cwl_type
    return CWL_TYPE_MAP[cwl_type.items]

PROV.zip

Basedir for data files

Would it be ok to have a landing folder for the data files?
Now it is placed in the root of the crate but this creates quite some clutter.

There are only 2 lines I think that needs to be changed if it is ok that contain

dest = Path(parent.id if parent else "") / hash_

which could turn into

dest = Path(parent.id if parent else "data") / hash_

I can create a pull request but maybe it is better to have a discussion first?

Error from runcrate convert

I am trying to convert CWLProv to RO crate using runcrate convert. But getting the following error every time I run this command:

File "/usr/lib/python3.10/site-packages/rocrate/rocrate.py", line 122, in __read
    raise ValueError(f"Not a valid RO-Crate: missing {Metadata.BASENAME}")
ValueError: Not a valid RO-Crate: missing ro-crate-metadata.json

I checked it with a workflow and a tool written in CWL. In both cases, I am getting the same error. Did someone try to run it with CWLprov recently?

CWLProv conversion: include primary-job.json

Note that the file cannot be simply copied as is: file / directory paths need to be converted to the crate ones, e.g.:

../data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376 => 327fc7aedf4f6b69a42a7c8b808dc5a7aff61376

Then there's the question of linking the entity to the rest of the metadata. Perhaps it can be represented as a configuration file for the workflow.

Bag validation failed: data/.DS_Store exists on filesystem but is not in the manifest

Related to #33 as I was regenerating the dataset my Mac decided it was time to create a .DS_Store file on the fly which of course cause some conflicts. I am not sure if this should be hard coded but .DS_Store are "useless" files for RO Crates as far as I can tell.

http://download.systemsbiology.nl/unlock/cwl/issues/PROV_DS_Store.zip

data/.DS_Store exists on filesystem but is not in the manifest
Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
    sys.exit(cli())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 67, in convert
    builder = ProvCrateBuilder(root, workflow_name, license, readme)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 188, in __init__
    self.ro = ResearchObject(BDBag(str(root)))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwlprov/ro.py", line 66, in __init__
    bag.validate()
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/bdbag/bdbagit.py", line 490, in validate
    self._validate_contents(processes=processes, fast=fast, completeness_only=completeness_only, callback=callback)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/bdbag/bdbagit.py", line 519, in _validate_contents
    self._validate_completeness()
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/bdbag/bdbagit.py", line 549, in _validate_completeness
    raise BagValidationError(_("Bag validation failed"), errors)
bagit.BagValidationError: Bag validation failed: data/.DS_Store exists on filesystem but is not in the manifest

AttributeError: 'CommandLineTool' object has no attribute 'rsplit'

Was testing some provenance workflows in CWLTool and encountered the following:

runcrate convert -o bla provenance

Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
    sys.exit(cli())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jasperk/gitlab/runcrate/src/runcrate/cli.py", line 67, in convert
    builder = ProvCrateBuilder(root, workflow_name, license, readme)
  File "/Users/jasperk/gitlab/runcrate/src/runcrate/convert.py", line 184, in __init__
    self.step_maps = self._get_step_maps(self.cwl_defs)
  File "/Users/jasperk/gitlab/runcrate/src/runcrate/convert.py", line 208, in _get_step_maps
    rval[k][f] = {"tool": get_fragment(s.run), "pos": pos_map[f]}
  File "/Users/jasperk/gitlab/runcrate/src/runcrate/convert.py", line 112, in get_fragment
    return uri.rsplit("#", 1)[-1]
AttributeError: 'CommandLineTool' object has no attribute 'rsplit'

The zip: http://download.systemsbiology.nl/unlock/cwl/issues/cwl_test_no_listing.zip

Bug in cwltool results in packed.cwl but no individual cwl files

Due to a bug in the cwltool (or maybe it is intentional) individual cwl files are missing from the PROV/workflow/ location and only the PROV/workflow/packed.cwl is available. This only happens when you start the workflow with a cwl:tool: argument in the input yaml file.

In theory the packed.cwl should be sufficient or not? Not sure if this is an issue in runcrate but I thought I would let you know.

http://download.systemsbiology.nl/unlock/cwl/issues/PROV_No_CWL.zip (Removed the .DS_Store files).

schema_salad.exceptions.ValidationException tried...

Not exactly sure what is happening here and sorry for the larger prov size but otherwise it takes a long time to compute on my laptop so I had to include the indexed lookup database.

http://download.systemsbiology.nl/unlock/cwl/issues/PROV_ngtax.zip

Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
    sys.exit(cli())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 67, in convert
    builder = ProvCrateBuilder(root, workflow_name, license, readme)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 187, in __init__
    self.cwl_defs = get_workflow(self.wf_path)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 167, in get_workflow
    defs = load_document_by_yaml(json_wf, wf_path.absolute().as_uri())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/__init__.py", line 128, in load_document_by_yaml
    result = cwl_v1_2.load_document_by_yaml(
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/cwl_v1_2.py", line 15471, in load_document_by_yaml
    return _document_load(
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/cwl_v1_2.py", line 578, in _document_load
    return loader.load(doc["$graph"], baseuri, loadingOptions)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/cwl_v1_2.py", line 417, in load
    raise ValidationException("", None, errors, "-")
schema_salad.exceptions.ValidationException: - tried _RecordLoader but
  Expected a dict
- tried _RecordLoader but
  Expected a dict
- tried _RecordLoader but
  Expected a dict
- tried _RecordLoader but
  Expected a dict
- tried _ArrayLoader but
  - tried _ArrayLoader but
    Expected a list
  - tried _UnionLoader but
    - tried _RecordLoader but
      Trying 'CommandLineTool'
        the `outputs` field is not valid because:
          - tried _ArrayLoader but
            - tried _ArrayLoader but
              Expected a list
            - tried _RecordLoader but
              Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:3:9:                       the `type`
                                                                                     field is not valid because:
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('File', 'Directory')
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stdout',)
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stderr',)
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _PrimitiveLoader but
                                                                                         Expected a
                                                                                         tuple but got NoneType
                                                                                       - tried
                                                                                       _ArrayLoader but
                                                                                         Expected a
                                                                                         list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:4:13:                     invalid field
                                                                                     `class`, expected one of:
                                                                                     `label`, `secondaryFiles`,
                                                                                     `streamable`, `doc`, `id`,
                                                                                     `format`, `type`,
                                                                                     `outputBinding`
            - tried _ArrayLoader but
              Expected a list
            - tried _RecordLoader but
              Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:79:9:                     the `type`
                                                                                     field is not valid because:
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('File', 'Directory')
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stdout',)
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stderr',)
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _PrimitiveLoader but
                                                                                         Expected a
                                                                                         tuple but got NoneType
                                                                                       - tried
                                                                                       _ArrayLoader but
                                                                                         Expected a
                                                                                         list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:80:13:                   invalid field
                                                                                     `class`, expected one of:
                                                                                     `label`, `secondaryFiles`,
                                                                                     `streamable`, `doc`, `id`,
                                                                                     `format`, `type`,
                                                                                     `outputBinding`
            - tried _ArrayLoader but
              Expected a list
            - tried _RecordLoader but
              Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:216:9:                   the `type`
                                                                                     field is not valid because:
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('File', 'Directory')
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stdout',)
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stderr',)
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _PrimitiveLoader but
                                                                                         Expected a
                                                                                         tuple but got NoneType
                                                                                       - tried
                                                                                       _ArrayLoader but
                                                                                         Expected a
                                                                                         list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:217:13:                 invalid field
                                                                                     `class`, expected one of:
                                                                                     `label`, `secondaryFiles`,
                                                                                     `streamable`, `doc`, `id`,
                                                                                     `format`, `type`,
                                                                                     `outputBinding`
            - tried _ArrayLoader but
              Expected a list
            - tried _RecordLoader but
              Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:448:9:                   the `type`
                                                                                     field is not valid because:
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('File', 'Directory')
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stdout',)
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stderr',)
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _PrimitiveLoader but
                                                                                         Expected a
                                                                                         tuple but got NoneType
                                                                                       - tried
                                                                                       _ArrayLoader but
                                                                                         Expected a
                                                                                         list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:449:13:                 invalid field
                                                                                     `class`, expected one of:
                                                                                     `label`, `secondaryFiles`,
                                                                                     `streamable`, `doc`, `id`,
                                                                                     `format`, `type`,
                                                                                     `outputBinding`
            - tried _ArrayLoader but
              Expected a list
            - tried _RecordLoader but
              Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:574:9:                   the `type`
                                                                                     field is not valid because:
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('File', 'Directory')
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stdout',)
                                                                                       - tried
                                                                                       _EnumLoader but
                                                                                         Expected
                                                                                         one of ('stderr',)
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _RecordLoader but
                                                                                         Expected a
                                                                                         dict
                                                                                       - tried
                                                                                       _PrimitiveLoader but
                                                                                         Expected a
                                                                                         tuple but got NoneType
                                                                                       - tried
                                                                                       _ArrayLoader but
                                                                                         Expected a
                                                                                         list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:575:13:                 invalid field
                                                                                     `class`, expected one of:
                                                                                     `label`, `secondaryFiles`,
                                                                                     `streamable`, `doc`, `id`,
                                                                                     `format`, `type`,
                                                                                     `outputBinding`
          - tried _RecordLoader but
            Expected a dict
    - tried _RecordLoader but
      Not a ExpressionTool
    - tried _RecordLoader but
      Not a Workflow
    - tried _RecordLoader but
      Not a Operation

KeyError: 'fastqc.cwl' while cwl file is in the provenance folder

This issue is around http://download.systemsbiology.nl/unlock/cwl/issues/PROV_ngtax.zip.

It tries to obtain the cwl_tool = self.cwl_defs[tool_fragment] where the tool_fragment is 'fastqc.cwl' however only the packed.cwl is available in the cwl_defs.

Screenshot 2023-05-26 at 09 56 40
find PROV_ngtax/ | grep cwl
PROV_ngtax//snapshot/workflow_ngtax.cwl
PROV_ngtax//snapshot/files_to_folder_tool.cwl
PROV_ngtax//snapshot/fastqc.cwl
PROV_ngtax//snapshot/ngtax.cwl
PROV_ngtax//snapshot/ngtax_to_tsv-fasta.cwl
PROV_ngtax//workflow/packed.cwl
PROV_ngtax//metadata/provenance/primary.cwlprov.jsonld
PROV_ngtax//metadata/provenance/primary.cwlprov.json
PROV_ngtax//metadata/provenance/primary.cwlprov.xml
PROV_ngtax//metadata/provenance/primary.cwlprov.nt
PROV_ngtax//metadata/provenance/primary.cwlprov.ttl
PROV_ngtax//metadata/provenance/primary.cwlprov.provn
Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
    sys.exit(cli())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 68, in convert
    crate = builder.build()
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 273, in build
    self.add_workflow(crate)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 329, in add_workflow
    self.add_step(crate, workflow, s)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 341, in add_step
    tool = self.add_tool(crate, workflow, cwl_step.run)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 352, in add_tool
    cwl_tool = self.cwl_defs[tool_fragment]
KeyError: 'fastqc.cwl'

Data locations using manifest file

I am currently diving into CWLTool with modifying file locations into subfolders when using provenance.
The application currently only looks in source = self.root / Path(β€œdata”) / hash_[:2] / hash_ which might not be correct in the long run.

Perhaps an option could be to use the manifest file for this?

cat PROV_NO_INPUT_2/manifest-sha1.txt
27a9b1a98a2f8a3cfa6fc7f7390e1f33d9afc944  data/output/27/27a9b1a98a2f8a3cfa6fc7f7390e1f33d9afc944
10a02d2e45f8a8b6bd31e2455e0fc68327e86b43  data/output/10/10a02d2e45f8a8b6bd31e2455e0fc68327e86b43
f4e9c47034057ed1be718a080b8bee488586333a  data/output/f4/f4e9c47034057ed1be718a080b8bee488586333a
c0f2eb41e128804b6742359671875729a43e6d94  data/output/c0/c0f2eb41e128804b6742359671875729a43e6d94
d822fc5e5e405049db4529b8054c8042444c1576  data/output/d8/d822fc5e5

Which has the appropriate (sub) locations?

Run: option to download remote inputs

Currently runcrate run does not try to download remote inputs. This should probably be optional, since remote resources could potentially be large in size.

Option to make data files more user friendly

For downstream processing or reusability of crates it would be great to have a human readable structure.
Currently it uses the checksums that were provided by CWLProv and the output there is not very friendly:

Screenshot 2023-05-23 at 10 47 53

This then results in a more combined structure in RO-Crate but if I would need a specific file for further processing in R for example I first need to process the JSON file before I would be able to identify the files. It also does not allow for browsing through the files and folders that is very useful if the data objects are shared among peers to get a better feeling of the data and structure without the need of fancy tooling to do this for you.

Screenshot 2023-05-23 at 10 45 58

Which in turn does not really reflect the output

Screenshot 2023-05-23 at 12 29 33

Support for ExpressionTool in convert

In convert, We are currently bailing out when an ExpressionTool is encountered:

if hasattr(cwl_tool, "expression"):
    raise RuntimeError("ExpressionTool not supported yet")

Can we support the conversion of ExpressionTool?

If the above clause is removed and we let the processing continue, it crashes because the plan for the activity corresponding to the execution of the ExpressionTool is not found. More specifically, resolve_plan returns None and the program crashes when it tries to do:

plan_tag = plan.id.localpart

Error message:

AttributeError: 'NoneType' object has no attribute 'id'

Adding some prints in resolve_plan:

    def _resolve_plan(self, activity):
        print("resolving plan for", activity.id)
        job_qname = activity.plan()
        print("  job qname:", job_qname)
        plan = activity.provenance.entity(job_qname)
        print("  plan:", plan)
        if not plan:
            m = SCATTER_JOB_PATTERN.match(str(job_qname))
            if m:
                plan = activity.provenance.entity(m.groups()[0])
        return plan

We get:

resolving plan for id:a9f719bd-9bf2-42a4-aa4a-163eb95351dd
  job qname: wf:main/
Entity wf:main/ not found in Provenance<urn:uuid:4b66a4db-eb94-43fe-8475-14d38ac3a3bc from /home/simleo/work/wf_run_crate/expression_tool/cwl/ngtax-run-1/metadata/provenance/primary.cwlprov.xml>
  plan: None

So the activity.plan() (job_qname) is just wf:main/, with no tool-specific tag after the slash. Compare this to the output for a tool in the conversion of tests/data/revsort-run-1:

resolving plan for id:f81dd60b-46db-4e58-b9f9-5606de1f10de
  job qname: wf:main/rev
  plan: entity(wf:main/rev, [prov:type='prov:Plan', prov:type='wfdesc:Process'])

Looking at primary.cwlprov.json:

  "wasAssociatedWith": {
    ...
    "_:id11": {
      "prov:activity": "id:a9f719bd-9bf2-42a4-aa4a-163eb95351dd",
      "prov:agent": "id:ed5680f3-84f4-423c-be6f-d9ed9991a436",
      "prov:plan": "wf:main/"
    },
    ...
}

prov:plan is also wf:main/ for other ExpressionTools used in the workflow. So this is something that's not supported by CWLProv.

The above results have been obtained by trying to convert the RO of an execution of https://gitlab.com/m-unlock/cwl/-/raw/main/workflows/workflow_ngtax.cwl with https://gitlab.com/m-unlock/cwl/-/raw/main/tests/ngtax/ngtax.yaml. The version of cwltool used was 3.1.20240112164112.

Convert CWLProv to RO - TypeError: unhashable type: 'list'

When executing: runcrate convert -o ROC PROV (see zip attached)

Traceback (most recent call last):
  File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
    sys.exit(cli())
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 67, in convert
    builder = ProvCrateBuilder(root, workflow_name, license, readme)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 185, in __init__
    self.step_maps = self._get_step_maps(self.cwl_defs)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 204, in _get_step_maps
    graph = build_step_graph(v)
  File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 140, in build_step_graph
    source_fragment = out_map.get(i.source)
TypeError: unhashable type: 'list'

PROV.zip

Handle distinct directories with same content

When serializing executions of workflows that take directory parameters, CWLProv does not create corresponding directories in the RO bundle: rather, files are always placed in directories whose name consists of the first two characters of the file's sha1 checksum.

When converting from CWLProv we recreate the original directories, giving them a name obtained by concatenating the sorted checksums of all contained files and computing the checksum of the concatenation. This means that directories with the same contents end up being mapped to the same directory in the output RO-Crate. This is especially convenient to avoid data duplication between workflow parameters and tool parameters: for instance, when a directory is an input of the workflow and also of the first step.

However, there are cases where we might not want to do that. For instance, suppose that a workflow takes an array of two directories as input:

cwlVersion: v1.2
class: Workflow
requirements:
  ScatterFeatureRequirement: {}

inputs:
  dir_array: Directory[]
outputs: []

steps:
  date_step:
    label: Prints date of input dirs
    scatter: dir
    in:
      dir: dir_array
    out: []
    run: dirdate.cwl

Where dirdate.cwl is:

cwlVersion: v1.2
class: CommandLineTool
baseCommand: [date, "-r"]

inputs:
  dir:
    type: Directory
    inputBinding:
      position: 1
outputs: []

Suppose the workflow is launched with the following parameters:

dir_array:
  - class: Directory
    location: foo
  - class: Directory
    location: bar

Where foo and bar have the same contents, e.g., they both contain a text file whose content is the string "dummy". What we currently get in the RO-Crate is:

{
    "@id": "packed.cwl#main/dir_array",
    "@type": "FormalParameter",
    "additionalType": "Dataset",
    "multipleValues": "True",
    "name": "dir_array"
},
...
{
    "@id": "#pv-main/dir_array",
    "@type": "PropertyValue",
    "exampleOfWork": {
        "@id": "packed.cwl#main/dir_array"
    },
    "name": "dir_array",
    "value": [
        {
            "@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/"
        },
        {
            "@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/"
        }
    ]
},
...
{
    "@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/",
    "@type": "Dataset",
    "alternateName": "foo",
    "exampleOfWork": {
        "@id": "packed.cwl#dirdate.cwl/dir"
    },
    "hasPart": [
        {
            "@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/0c8b9d6f753e8d8ec9276bfe98e993a133847642"
        }
    ]
},

Note that the duplicate id in the value of #pv-main/dir_array is a bug: the list should contain only one copy, since the duplicate makes no sense in the RO-Crate JSON-LD. Also, the Dataset has an alternateName of "foo", while "bar" does not appear in the metadata. Thus, in this case, the representation does not reflect the fact that the workflow took a list of two distinct directories as input.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.