Git Product home page Git Product logo

workflow-run-crate's Introduction

workflow-run-crate

Workflow Run RO-Crate profile

workflow-run-crate's People

Contributors

alaninmcr avatar dgarijo avatar glassofwhiskey avatar ilveroluca avatar jmfernandez avatar kinow avatar lrodrin avatar pauldg avatar rsirvent avatar simleo avatar stain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

workflow-run-crate's Issues

CQ6 - Workflow running time

How long does this workflow take to run?

The actual duration of the represented workflow run can be obtained from endTime - startTime on the CreateAction. Providing an estimate of the typical running time, on the other hand, is a different thing. Can we use totalTime for that? Or a more specific custom property like estimatedRunningTime? And do workflow languages have annotation fields for this?

cwlprov_to_crate: support for nested workflows

Workflows can run other workflows as subworkflows. CWLProv outputs separate provenance documents in this case, but such runs are not yet supported in cwlprov_to_crate. Functionally, we need to add the capability to parse the provenance metadata in this scenario. Then there's the issue of adding subworkflow metadata to the RO-Crate. In the relationship graph, subworkflows need to appear in the same place as tool wrappers (what's run by a step). Their type should be the same as the main workflow, minus File since they are stored as sections in packed.cwl:

["SoftwareSourceCode", "ComputationalWorkflow", "HowTo"]

Then we'd need to recursively convert all subworkflows as we did for the main one.

One possibly weird consequence is that some of the workflow components would be SoftwareApplications (the tool wrappers) while others would be of type SoftwareSourceCode (the subworkflows). I guess the reason for the presence of both entities in Schema.org is that the former should model an executable, while the latter should represent code that needs to be compiled. With interpreted languages such as CWL (or Python, etc.), however, the source code is also runnable, so the distinction is not so meaningful.

CQ4 - environment/container file

What is the environment/container file used in a specific workflow execution step?

Similar to the configuration file (#11) problem. Need env dump support from workflow engine.

Representing environment variables

Both in the prospective provenance (what's the name of the variable that a workflow or tool needs?) and the retrospective one (what was the value).

Also, how to hide the value if it's sensitive.

Brought up by @jmfernandez

CQ9 - Software version

What is the source code version of the component executed in a workflow step? Is it a script? and executable?

We can use softwareVersion, though getting the version of the actual tool (e.g., grep) that was called by the wrapper might not be easy (related to container image).

CQ10 - Tool wrappers

What is the script used to wrap up a software component?

We're mapping tool wrappers (e.g., foo.cwl) to SoftwareApplication. Wrappers at lower levels can also be SoftwareApplication, but we need to draw the line somewhere (related to container image).

CQ1 - Container image

What container images (e.g., Docker) were used by the run?

  • Source entity: CreateAction
  • Target entity? It could be File if the image is a tarball from docker save
  • Property? Overload image?

CQ3 - Configuration files

What are the configuration files used in a workflow execution step?

ChooseAction? Though maybe the crate generator should just merge the params with the other ones if it can parse the config file. To link to the config file as a black box instead we probably need a new property.

CQ11 - Parameter connections

Knowing how workflow parameters were passed to individual tools is important to find out how they affected the outputs.

We are currently linking workflow and tool parameters with connectedTo from the source tool / workflow to the target tool / workflow. For instance, in revsort:

graph

we currently have:

{
    "@id": "packed.cwl#revtool.cwl",
    "@type": "SoftwareApplication",
    "input": [
        {"@id": "packed.cwl#revtool.cwl/input"}
    ],
    "output": [
        {"@id": "packed.cwl#revtool.cwl/output"}
    ]
},
{
    "@id": "packed.cwl#sorttool.cwl",
    "@type": "SoftwareApplication",
    "input": [
        {"@id": "packed.cwl#sorttool.cwl/reverse"},
        {"@id": "packed.cwl#sorttool.cwl/input"}
    ],
    "output": [
        {"@id": "packed.cwl#sorttool.cwl/output"}
    ]
},
{
    "@id": "packed.cwl#revtool.cwl/output",
    "@type": "FormalParameter",
    "connectedTo": {"@id": "packed.cwl#sorttool.cwl/input"}
}

but that's inaccurate, since such links only exist within the revsort workflow. packed.cwl#revtool.cwl and packed.cwl#sorttool.cwl represent standalone software tools that happen to be connected this way in revsort, but might be used differently in another workflow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.