researchobject / runcrate Goto Github PK
View Code? Open in Web Editor NEWhttp://www.researchobject.org/runcrate/
License: Apache License 2.0
http://www.researchobject.org/runcrate/
License: Apache License 2.0
After changing the expression steps to command line steps I have the following:
runcrate convert -o ROC PROV
on the PROV folder attached
/Volumes/Git/m-unlock/cwl/tests/PROV: sha256 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha512 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha1 manifest lists snapshot/concatenate.cwl multiple times with the same value
Traceback (most recent call last):
File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
sys.exit(cli())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 68, in convert
crate = builder.build()
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 266, in build
self.add_workflow(crate)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 322, in add_workflow
self.add_step(crate, workflow, s)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 334, in add_step
tool = self.add_tool(crate, workflow, cwl_step.run)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 373, in add_tool
tool["input"] = self.add_params(crate, cwl_tool.inputs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 387, in add_params
properties = properties_from_cwl_param(cwl_p)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 81, in properties_from_cwl_param
additional_type = "Collection" if cwl_p.secondaryFiles else convert_cwl_type(cwl_p.type)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 65, in convert_cwl_type
s = set(convert_cwl_type(_) for _ in cwl_type)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 65, in <genexpr>
s = set(convert_cwl_type(_) for _ in cwl_type)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 73, in convert_cwl_type
return CWL_TYPE_MAP[cwl_type.items]
I am trying to convert CWLProv to RO crate using runcrate convert
. But getting the following error every time I run this command:
File "/usr/lib/python3.10/site-packages/rocrate/rocrate.py", line 122, in __read
raise ValueError(f"Not a valid RO-Crate: missing {Metadata.BASENAME}")
ValueError: Not a valid RO-Crate: missing ro-crate-metadata.json
I checked it with a workflow and a tool written in CWL. In both cases, I am getting the same error. Did someone try to run it with CWLprov recently?
When executing: runcrate convert -o ROC PROV
(see zip attached)
Traceback (most recent call last):
File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
sys.exit(cli())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 67, in convert
builder = ProvCrateBuilder(root, workflow_name, license, readme)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 185, in __init__
self.step_maps = self._get_step_maps(self.cwl_defs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 204, in _get_step_maps
graph = build_step_graph(v)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 140, in build_step_graph
source_fragment = out_map.get(i.source)
TypeError: unhashable type: 'list'
Not exactly sure what is happening here and sorry for the larger prov size but otherwise it takes a long time to compute on my laptop so I had to include the indexed lookup database.
http://download.systemsbiology.nl/unlock/cwl/issues/PROV_ngtax.zip
Traceback (most recent call last):
File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
sys.exit(cli())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 67, in convert
builder = ProvCrateBuilder(root, workflow_name, license, readme)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 187, in __init__
self.cwl_defs = get_workflow(self.wf_path)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 167, in get_workflow
defs = load_document_by_yaml(json_wf, wf_path.absolute().as_uri())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/__init__.py", line 128, in load_document_by_yaml
result = cwl_v1_2.load_document_by_yaml(
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/cwl_v1_2.py", line 15471, in load_document_by_yaml
return _document_load(
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/cwl_v1_2.py", line 578, in _document_load
return loader.load(doc["$graph"], baseuri, loadingOptions)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwl_utils/parser/cwl_v1_2.py", line 417, in load
raise ValidationException("", None, errors, "-")
schema_salad.exceptions.ValidationException: - tried _RecordLoader but
Expected a dict
- tried _RecordLoader but
Expected a dict
- tried _RecordLoader but
Expected a dict
- tried _RecordLoader but
Expected a dict
- tried _ArrayLoader but
- tried _ArrayLoader but
Expected a list
- tried _UnionLoader but
- tried _RecordLoader but
Trying 'CommandLineTool'
the `outputs` field is not valid because:
- tried _ArrayLoader but
- tried _ArrayLoader but
Expected a list
- tried _RecordLoader but
Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:3:9: the `type`
field is not valid because:
- tried
_EnumLoader but
Expected
one of ('File', 'Directory')
- tried
_EnumLoader but
Expected
one of ('stdout',)
- tried
_EnumLoader but
Expected
one of ('stderr',)
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_PrimitiveLoader but
Expected a
tuple but got NoneType
- tried
_ArrayLoader but
Expected a
list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:4:13: invalid field
`class`, expected one of:
`label`, `secondaryFiles`,
`streamable`, `doc`, `id`,
`format`, `type`,
`outputBinding`
- tried _ArrayLoader but
Expected a list
- tried _RecordLoader but
Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:79:9: the `type`
field is not valid because:
- tried
_EnumLoader but
Expected
one of ('File', 'Directory')
- tried
_EnumLoader but
Expected
one of ('stdout',)
- tried
_EnumLoader but
Expected
one of ('stderr',)
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_PrimitiveLoader but
Expected a
tuple but got NoneType
- tried
_ArrayLoader but
Expected a
list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:80:13: invalid field
`class`, expected one of:
`label`, `secondaryFiles`,
`streamable`, `doc`, `id`,
`format`, `type`,
`outputBinding`
- tried _ArrayLoader but
Expected a list
- tried _RecordLoader but
Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:216:9: the `type`
field is not valid because:
- tried
_EnumLoader but
Expected
one of ('File', 'Directory')
- tried
_EnumLoader but
Expected
one of ('stdout',)
- tried
_EnumLoader but
Expected
one of ('stderr',)
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_PrimitiveLoader but
Expected a
tuple but got NoneType
- tried
_ArrayLoader but
Expected a
list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:217:13: invalid field
`class`, expected one of:
`label`, `secondaryFiles`,
`streamable`, `doc`, `id`,
`format`, `type`,
`outputBinding`
- tried _ArrayLoader but
Expected a list
- tried _RecordLoader but
Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:448:9: the `type`
field is not valid because:
- tried
_EnumLoader but
Expected
one of ('File', 'Directory')
- tried
_EnumLoader but
Expected
one of ('stdout',)
- tried
_EnumLoader but
Expected
one of ('stderr',)
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_PrimitiveLoader but
Expected a
tuple but got NoneType
- tried
_ArrayLoader but
Expected a
list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:449:13: invalid field
`class`, expected one of:
`label`, `secondaryFiles`,
`streamable`, `doc`, `id`,
`format`, `type`,
`outputBinding`
- tried _ArrayLoader but
Expected a list
- tried _RecordLoader but
Trying 'CommandOutputParameter'
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:574:9: the `type`
field is not valid because:
- tried
_EnumLoader but
Expected
one of ('File', 'Directory')
- tried
_EnumLoader but
Expected
one of ('stdout',)
- tried
_EnumLoader but
Expected
one of ('stderr',)
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_RecordLoader but
Expected a
dict
- tried
_PrimitiveLoader but
Expected a
tuple but got NoneType
- tried
_ArrayLoader but
Expected a
list
PROV_ngtax/workflow/packed.cwl#ngtax.cwl/reference_db_lookup:575:13: invalid field
`class`, expected one of:
`label`, `secondaryFiles`,
`streamable`, `doc`, `id`,
`format`, `type`,
`outputBinding`
- tried _RecordLoader but
Expected a dict
- tried _RecordLoader but
Not a ExpressionTool
- tried _RecordLoader but
Not a Workflow
- tried _RecordLoader but
Not a Operation
See RenskeW/runcrate-analysis#2
Mappings:
prov:time
field in wasGeneratedBy
entries)File
entity -- we already add it to the FormalParameter
)This issue is around http://download.systemsbiology.nl/unlock/cwl/issues/PROV_ngtax.zip.
It tries to obtain the cwl_tool = self.cwl_defs[tool_fragment]
where the tool_fragment is 'fastqc.cwl' however only the packed.cwl
is available in the cwl_defs
.
find PROV_ngtax/ | grep cwl
PROV_ngtax//snapshot/workflow_ngtax.cwl
PROV_ngtax//snapshot/files_to_folder_tool.cwl
PROV_ngtax//snapshot/fastqc.cwl
PROV_ngtax//snapshot/ngtax.cwl
PROV_ngtax//snapshot/ngtax_to_tsv-fasta.cwl
PROV_ngtax//workflow/packed.cwl
PROV_ngtax//metadata/provenance/primary.cwlprov.jsonld
PROV_ngtax//metadata/provenance/primary.cwlprov.json
PROV_ngtax//metadata/provenance/primary.cwlprov.xml
PROV_ngtax//metadata/provenance/primary.cwlprov.nt
PROV_ngtax//metadata/provenance/primary.cwlprov.ttl
PROV_ngtax//metadata/provenance/primary.cwlprov.provn
Traceback (most recent call last):
File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
sys.exit(cli())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 68, in convert
crate = builder.build()
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 273, in build
self.add_workflow(crate)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 329, in add_workflow
self.add_step(crate, workflow, s)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 341, in add_step
tool = self.add_tool(crate, workflow, cwl_step.run)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 352, in add_tool
cwl_tool = self.cwl_defs[tool_fragment]
KeyError: 'fastqc.cwl'
Note that the file cannot be simply copied as is: file / directory paths need to be converted to the crate ones, e.g.:
../data/32/327fc7aedf4f6b69a42a7c8b808dc5a7aff61376 => 327fc7aedf4f6b69a42a7c8b808dc5a7aff61376
Then there's the question of linking the entity to the rest of the metadata. Perhaps it can be represented as a configuration file for the workflow.
When serializing executions of workflows that take directory parameters, CWLProv does not create corresponding directories in the RO bundle: rather, files are always placed in directories whose name consists of the first two characters of the file's sha1 checksum.
When converting from CWLProv we recreate the original directories, giving them a name obtained by concatenating the sorted checksums of all contained files and computing the checksum of the concatenation. This means that directories with the same contents end up being mapped to the same directory in the output RO-Crate. This is especially convenient to avoid data duplication between workflow parameters and tool parameters: for instance, when a directory is an input of the workflow and also of the first step.
However, there are cases where we might not want to do that. For instance, suppose that a workflow takes an array of two directories as input:
cwlVersion: v1.2
class: Workflow
requirements:
ScatterFeatureRequirement: {}
inputs:
dir_array: Directory[]
outputs: []
steps:
date_step:
label: Prints date of input dirs
scatter: dir
in:
dir: dir_array
out: []
run: dirdate.cwl
Where dirdate.cwl
is:
cwlVersion: v1.2
class: CommandLineTool
baseCommand: [date, "-r"]
inputs:
dir:
type: Directory
inputBinding:
position: 1
outputs: []
Suppose the workflow is launched with the following parameters:
dir_array:
- class: Directory
location: foo
- class: Directory
location: bar
Where foo
and bar
have the same contents, e.g., they both contain a text file whose content is the string "dummy". What we currently get in the RO-Crate is:
{
"@id": "packed.cwl#main/dir_array",
"@type": "FormalParameter",
"additionalType": "Dataset",
"multipleValues": "True",
"name": "dir_array"
},
...
{
"@id": "#pv-main/dir_array",
"@type": "PropertyValue",
"exampleOfWork": {
"@id": "packed.cwl#main/dir_array"
},
"name": "dir_array",
"value": [
{
"@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/"
},
{
"@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/"
}
]
},
...
{
"@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/",
"@type": "Dataset",
"alternateName": "foo",
"exampleOfWork": {
"@id": "packed.cwl#dirdate.cwl/dir"
},
"hasPart": [
{
"@id": "df3cc24afc943eab58469eebaff500a2a4a823c5/0c8b9d6f753e8d8ec9276bfe98e993a133847642"
}
]
},
Note that the duplicate id in the value
of #pv-main/dir_array
is a bug: the list should contain only one copy, since the duplicate makes no sense in the RO-Crate JSON-LD. Also, the Dataset
has an alternateName
of "foo", while "bar" does not appear in the metadata. Thus, in this case, the representation does not reflect the fact that the workflow took a list of two distinct directories as input.
Related to #33 as I was regenerating the dataset my Mac decided it was time to create a .DS_Store file on the fly which of course cause some conflicts. I am not sure if this should be hard coded but .DS_Store are "useless" files for RO Crates as far as I can tell.
http://download.systemsbiology.nl/unlock/cwl/issues/PROV_DS_Store.zip
data/.DS_Store exists on filesystem but is not in the manifest
Traceback (most recent call last):
File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
sys.exit(cli())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 67, in convert
builder = ProvCrateBuilder(root, workflow_name, license, readme)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 188, in __init__
self.ro = ResearchObject(BDBag(str(root)))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/cwlprov/ro.py", line 66, in __init__
bag.validate()
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/bdbag/bdbagit.py", line 490, in validate
self._validate_contents(processes=processes, fast=fast, completeness_only=completeness_only, callback=callback)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/bdbag/bdbagit.py", line 519, in _validate_contents
self._validate_completeness()
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/bdbag/bdbagit.py", line 549, in _validate_completeness
raise BagValidationError(_("Bag validation failed"), errors)
bagit.BagValidationError: Bag validation failed: data/.DS_Store exists on filesystem but is not in the manifest
Would it be ok to have a landing folder for the data files?
Now it is placed in the root of the crate but this creates quite some clutter.
There are only 2 lines I think that needs to be changed if it is ok that contain
dest = Path(parent.id if parent else "") / hash_
which could turn into
dest = Path(parent.id if parent else "data") / hash_
I can create a pull request but maybe it is better to have a discussion first?
For downstream processing or reusability of crates it would be great to have a human readable structure.
Currently it uses the checksums that were provided by CWLProv and the output there is not very friendly:
This then results in a more combined structure in RO-Crate but if I would need a specific file for further processing in R for example I first need to process the JSON file before I would be able to identify the files. It also does not allow for browsing through the files and folders that is very useful if the data objects are shared among peers to get a better feeling of the data and structure without the need of fancy tooling to do this for you.
Which in turn does not really reflect the output
Currently runcrate run
does not try to download remote inputs. This should probably be optional, since remote resources could potentially be large in size.
Convert the cwlprov:basename
field to alternateName
(see ResearchObject/workflow-run-crate#41).
Note that the current version of cwltool reports the basename only for files. We need a cwltool PR to have it reported for directories as well.
As the title suggests there is no filename/folder name information in the ro-crate-metadata.json after conversion.
runcrate convert -o ROC_proteomics PROV
The CWLProv: http://download.systemsbiology.nl/unlock/cwl/issues/PROV_proteomics.zip
The ROCProv: http://download.systemsbiology.nl/unlock/cwl/issues/ROC_proteomics.zip
See RenskeW/runcrate-analysis#1 (comment)
Mappings suggested in https://doi.org/10.5281/zenodo.7014948:
In convert
, We are currently bailing out when an ExpressionTool
is encountered:
if hasattr(cwl_tool, "expression"):
raise RuntimeError("ExpressionTool not supported yet")
Can we support the conversion of ExpressionTool
?
If the above clause is removed and we let the processing continue, it crashes because the plan for the activity corresponding to the execution of the ExpressionTool is not found. More specifically, resolve_plan
returns None
and the program crashes when it tries to do:
plan_tag = plan.id.localpart
Error message:
AttributeError: 'NoneType' object has no attribute 'id'
Adding some prints in resolve_plan
:
def _resolve_plan(self, activity):
print("resolving plan for", activity.id)
job_qname = activity.plan()
print(" job qname:", job_qname)
plan = activity.provenance.entity(job_qname)
print(" plan:", plan)
if not plan:
m = SCATTER_JOB_PATTERN.match(str(job_qname))
if m:
plan = activity.provenance.entity(m.groups()[0])
return plan
We get:
resolving plan for id:a9f719bd-9bf2-42a4-aa4a-163eb95351dd
job qname: wf:main/
Entity wf:main/ not found in Provenance<urn:uuid:4b66a4db-eb94-43fe-8475-14d38ac3a3bc from /home/simleo/work/wf_run_crate/expression_tool/cwl/ngtax-run-1/metadata/provenance/primary.cwlprov.xml>
plan: None
So the activity.plan()
(job_qname
) is just wf:main/
, with no tool-specific tag after the slash. Compare this to the output for a tool in the conversion of tests/data/revsort-run-1
:
resolving plan for id:f81dd60b-46db-4e58-b9f9-5606de1f10de
job qname: wf:main/rev
plan: entity(wf:main/rev, [prov:type='prov:Plan', prov:type='wfdesc:Process'])
Looking at primary.cwlprov.json
:
"wasAssociatedWith": {
...
"_:id11": {
"prov:activity": "id:a9f719bd-9bf2-42a4-aa4a-163eb95351dd",
"prov:agent": "id:ed5680f3-84f4-423c-be6f-d9ed9991a436",
"prov:plan": "wf:main/"
},
...
}
prov:plan
is also wf:main/
for other ExpressionTool
s used in the workflow. So this is something that's not supported by CWLProv.
The above results have been obtained by trying to convert the RO of an execution of https://gitlab.com/m-unlock/cwl/-/raw/main/workflows/workflow_ngtax.cwl with https://gitlab.com/m-unlock/cwl/-/raw/main/tests/ngtax/ngtax.yaml. The version of cwltool used was 3.1.20240112164112
.
Due to a bug in the cwltool (or maybe it is intentional) individual cwl files are missing from the PROV/workflow/ location and only the PROV/workflow/packed.cwl is available. This only happens when you start the workflow with a cwl:tool: argument in the input yaml file.
In theory the packed.cwl should be sufficient or not? Not sure if this is an issue in runcrate but I thought I would let you know.
http://download.systemsbiology.nl/unlock/cwl/issues/PROV_No_CWL.zip (Removed the .DS_Store files).
Trying another workflow with newly transformed expressionTool to commandLineTool...
Unfortunately this zip file is 109MB. If you need a copy let me know how I can share this with you.
/Volumes/Git/m-unlock/cwl/tests/PROV: sha256 manifest lists snapshot/array_to_file_tool.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha256 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha1 manifest lists snapshot/array_to_file_tool.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha1 manifest lists snapshot/concatenate.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha512 manifest lists snapshot/array_to_file_tool.cwl multiple times with the same value
/Volumes/Git/m-unlock/cwl/tests/PROV: sha512 manifest lists snapshot/concatenate.cwl multiple times with the same value
Traceback (most recent call last):
File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
sys.exit(cli())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/cli.py", line 68, in convert
crate = builder.build()
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 266, in build
self.add_workflow(crate)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 322, in add_workflow
self.add_step(crate, workflow, s)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 334, in add_step
tool = self.add_tool(crate, workflow, cwl_step.run)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 380, in add_tool
self.add_param_connections(crate, tool)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 623, in add_param_connections
from_param = get_fragment(s)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/runcrate/convert.py", line 115, in get_fragment
return uri.rsplit("#", 1)[-1]
AttributeError: 'NoneType' object has no attribute 'rsplit'
I am currently diving into CWLTool with modifying file locations into subfolders when using provenance.
The application currently only looks in source = self.root / Path(βdataβ) / hash_[:2] / hash_
which might not be correct in the long run.
Perhaps an option could be to use the manifest file for this?
cat PROV_NO_INPUT_2/manifest-sha1.txt
27a9b1a98a2f8a3cfa6fc7f7390e1f33d9afc944 data/output/27/27a9b1a98a2f8a3cfa6fc7f7390e1f33d9afc944
10a02d2e45f8a8b6bd31e2455e0fc68327e86b43 data/output/10/10a02d2e45f8a8b6bd31e2455e0fc68327e86b43
f4e9c47034057ed1be718a080b8bee488586333a data/output/f4/f4e9c47034057ed1be718a080b8bee488586333a
c0f2eb41e128804b6742359671875729a43e6d94 data/output/c0/c0f2eb41e128804b6742359671875729a43e6d94
d822fc5e5e405049db4529b8054c8042444c1576 data/output/d8/d822fc5e5
Which has the appropriate (sub) locations?
Was testing some provenance workflows in CWLTool and encountered the following:
runcrate convert -o bla provenance
Traceback (most recent call last):
File "/Users/jasperk/mambaforge/bin/runcrate", line 8, in <module>
sys.exit(cli())
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/jasperk/mambaforge/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/jasperk/gitlab/runcrate/src/runcrate/cli.py", line 67, in convert
builder = ProvCrateBuilder(root, workflow_name, license, readme)
File "/Users/jasperk/gitlab/runcrate/src/runcrate/convert.py", line 184, in __init__
self.step_maps = self._get_step_maps(self.cwl_defs)
File "/Users/jasperk/gitlab/runcrate/src/runcrate/convert.py", line 208, in _get_step_maps
rval[k][f] = {"tool": get_fragment(s.run), "pos": pos_map[f]}
File "/Users/jasperk/gitlab/runcrate/src/runcrate/convert.py", line 112, in get_fragment
return uri.rsplit("#", 1)[-1]
AttributeError: 'CommandLineTool' object has no attribute 'rsplit'
The zip: http://download.systemsbiology.nl/unlock/cwl/issues/cwl_test_no_listing.zip
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.