Git Product home page Git Product logo

orquesta's Introduction

Orquesta

Orquesta is a graph based workflow engine designed specifically for StackStorm. As a building block, Orquesta does not include all the parts such as messaging, persistence, and locking required to run as a service.

The engine consists of the workflow models that are decomposed from the language spec, the composer that composes the execution graph from the workflow models, and the conductor that directs the execution of the workflow using the graph.

A workflow definition is a structured YAML file that describes the intent of the workflow. A workflow is made up of one or more tasks. A task defines what action to execute, with what input. When a task completes, it can transition into other tasks based upon criteria. Tasks can also publish output for the next tasks. When there are no more tasks to execute, the workflow is complete.

Orquesta includes a native language spec for the workflow definition. The language spec is decomposed into various models and described with JSON schema. A workflow composer that understands the models converts the workflow definition into a directed graph. The nodes represent the tasks and edges are the task transition. The criteria for task transition is an attribute of the edge. The graph is the underpinning for conducting the workflow execution. The workflow definition is just syntactic sugar.

Orquesta allows for one or more language specs to be defined. So as long as the workflow definition, however structured, is composed into the expected graph, the workflow conductor can handle it.

The workflow execution graph can be a directed graph or a directed cycle graph. It can have one or more root nodes which are the starting tasks for the workflow. The graph can have branches that run in parallel and then converge back to a single branch. A single branch in the graph can diverge into multiple branches. The graph model exposes operations to identify starting tasks, get inbound and outbound task transitions, get connected tasks, and check if cycle exists. The graph serves more like a map for the conductor. It is stateless and does not contain any runtime data such as task status and result.

The workflow conductor traverses the graph, directs the flow of the workflow execution, and tracks runtime state of the execution. The conductor does not actually execute the action that is specified for the task. The action execution is perform by another provider such as StackStorm. The conductor directs the provider on what action to execute. As each action execution completes, the provider relays the status and result back to the conductor. The conductor then takes the state change, keeps track of the sequence of task execution, manages change history of the runtime context, evaluate outbound task transitions, identifies any new tasks for execution, and determines the overall workflow state and result.

Copyright, License, and Contributors Agreement

Copyright 2019-2021 The StackStorm Authors. Copyright 2014-2018 StackStorm, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this work except in compliance with the License. You may obtain a copy of the License in the LICENSE file or at http://www.apache.org/licenses/LICENSE-2.0.

By contributing you agree that these contributions are your own (or approved by your employer) and you grant a full, complete, irrevocable copyright license to all users and developers of the project, present and future, pursuant to the license of the project.

Getting Help

If you need help or get stuck at any point during development, stop by on our Slack Community and we will do our best to assist you.

orquesta's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orquesta's Issues

Implement rerun from specific task(s)

Implement rerun of workflow execution from specific task(s). See the command st2 execution re-run <execution-id> --tasks and Mistral runner on how this is implemented. Rerunning from a task should continue with the existing workflow execution record. The orquesta runner should determine if the specified task can be rerun and then request task execution on the workflow engine. Please note that this is unlike rerun of python/shell actions or rerun workflow execution from the start which will create a new database record for the action execution.

The ujson 2.0.x doesn't compatible with Orquesta

The ujson library that orquesta uses major upgraded on Mar 8, 2020. But it doesn't compatible with current Orquesta.
(c.f. https://pypi.org/project/ujson/#history)

Following code is am example to confirm it.

from orquesta import conducting
from orquesta.specs import native as native_specs
from orquesta import statuses
from orquesta.utils import jsonify as json_util

wf_def = """
version: 1.0

vars:
  - xs:
    - fee

tasks:
  task1:
    with: <% ctx(xs) %>
    action: core.echo message=<% item() %>
"""

spec = native_specs.WorkflowSpec(wf_def)

conductor = conducting.WorkflowConductor(spec)
conductor.request_workflow_status(statuses.RUNNING)

actual_tasks = conductor.get_next_tasks()
json_util.deepcopy(actual_tasks)

Case of using ujson(v1.35)

スクリーンショット 2020-03-24 17 45 02

Case of using ujson(v2.0.0)

スクリーンショット 2020-03-24 17 44 05

Case of using ujson(v2.0.2 (current latest version))

スクリーンショット 2020-03-24 17 44 34

`Unknown function "st2kv"` when running examples.orchestra-st2kv

vagrant@st2test:~$ st2 run examples.orchestra-st2kv key_name="something"
.
id: 5b332e2e55fc8c57d28831cc
action.ref: examples.orchestra-st2kv
parameters:
  key_name: something
status: failed
start_timestamp: Wed, 27 Jun 2018 06:26:54 UTC
end_timestamp: Wed, 27 Jun 2018 06:26:54 UTC
result:
  errors:
  - message: Unknown function "st2kv"
    task_id: create_vm
  output: null

Add ability in task spec to wait for a lock before proceeding

There are use cases where users want to synchronize access to a resource (i.e. server) across multiple and different workflow executions. Currently, StackStorm offers concurrency policies to synchronize executions for one given workflow.

The following proposal allows specification of the lock requirement across different workflow definition so synchronization is possible against different workflow executions on the same resource(s). In the proposal below, a new wait attribute is introduced to the task spec. The wait attribute takes the name of the lock, which can be a unique name or name of the resource. The task will sleep/delay until the lock is acquired. On exit from the task scope, the lock will be released.

An additional wait delay can be specified. If the lock is not acquired within the wait period, the task will abandon the wait and exit with failure. On failure, the task transition will be traversed as normal so that clean up or retry is possible.

tasks:
  task1:
    wait: <% ctx().resource_lock_name %>
    action: ...
tasks:
  task1:
    wait:
      lock: <% ctx().resource_lock_name %>
      delay: 600
    action: ...

The mechanism will require on external platform such as redis which is currently already being used for various use cases in StackStorm (required for concurrency policies and running complex branching workflows, etc.).

"ExpressionEvaluationException: Unable to resolve key" error should include list of available keys

Orquesta/Jinja/YAQL currently throws an error when you try to use a key that does not exist:

"YaqlEvaluationException: Unable to resolve key 'result' in expression '<% result().result.output %>' from context."

But this leaves you guessing as to what valid keys you can use.

It would greatly ease workflow development if we dumped a list of valid keys in the error message:

"YaqlEvaluationException: Unable to resolve key 'result' in expression '<% result().result.output %>' from context; must be one of 'option1', 'option2', or 'option3'."

But if we want to avoid l10n/i18n issues with lists (I'm also not sure we handle this at all), we can also simply dump an array of options:

"YaqlEvaluationException: Unable to resolve key 'result' in expression '<% result().result.output %>' from context; must be one of ['option1', 'option2', 'option3']."

Parent status stays in 'resuming' when the inquiry times out

SUMMARY

The Orquesta workflow stays in resuming status if the an inquiry within the workflow doesn't have both when: <% succeeded() %> and when: <% failed() %>.

To reproduce the issue: here

ISSUE TYPE
Bug Report

STACKSTORM VERSION
st2 2.10.3, on Python 2.7.5

Retries using with-items runs a retry even on objects that succeeded as well

We use the with-items to loop over and run an action.
If it fails for even one of them, it retries over all the items in the array again, instead of just retrying the failed ones.

For instance, if param1 was set to 1,2,3,4 in the below sample workflow:

version:                1.0
description:            Test retry workflow
input:
  - param1
  - slack_channel
tasks:
  split_list:
    action: core.noop
    next:
    - when: <% succeeded() %>
      do: print_val
      publish:
        - list_vals: <% ctx(param1).split(',') %>
  print_val:
    with: <% ctx(list_vals) %>
    action: core.local
    input :
      cmd : >
        val=<% item() %>;
        if [ $val -eq 2 ];then exit -1;fi
    retry:
      when: <% failed() %>
      count: 5
      delay: 2
    next:
      - when: <% succeeded() %>
        publish:
          - status: "SUCCESS"
      - when: <% failed() %>
        do:
          - send_failed_alert
        publish:
          - status: "FAILED"
  send_failed_alert:
    action: slack.chat.postMessage
    input:
      channel: "#<% ctx(slack_channel) %>"
      text: "ERROR: failed on param <% ctx(param1) %> failed"`

And here is the output of the run.

image

How to use notify with orquesta workflow and jinja expression support?

Issue: Not able to notify in slack channel with workflow output/stdout. How to use workflow context and variable in notify parameter, without writing for each of the task in the workflow

Workflow-meta.yaml

---
description: A Nessus scanner reporter validator workflow
enabled: true
runner_type: orquesta
entry_point: workflows/nessus_report_validation_workflow.yaml
name: nessus_report_validation_workflow
pack: st2pack_poc
notify:
  on-complete:
    routes:
      - slack
    message: "Action has finished"
  on-success:
    routes:
      - slack
    message: "Succeeded: {{action_results.stdout}}"
  on-failure:
    routes:
      - slack
    message: |
     Yikes!! It failed. {{ stdout }} {{ result().stdout }} <% ctx() %> <% result().issue_key %>  <% stdout %>!!
parameters:
...

Slack channel output:

image

Can't access the single result for `with` itme in the current thread

ST2 version: st2 3.2dev, on Python 2.7.12
The bug is found by customer.

result() is for all the results from the with and can’t access the single result for the current thread.
The Workflow that duplicated issue:

actions/orquesta-with-items-hostname.yaml
---
name: orquesta-with-items-hostname
description: A workflow demonstrating with items.
runner_type: orquesta
entry_point: workflows/tests/orquesta-with-items-hostname.yaml
pack: examples
enabled: true

workflows/tests/orquesta-with-items-hostname.yaml
version: 1.0
description: A workflow demonstrating with items and concurrent processing.

vars:
  - members: [{"hostname": "Extreme", }, {"hostname": "Google"}, {"hostname":"StackStorm"}]

tasks:
  task1:
    with: <% ctx(members) %>
    action: examples.get_url
    input:
      hostname: <% item().get(hostname) %>
    next:
      - when: <% task(task1).result.result = "https://extreme.com" %>
        do: notify
  notify:
    action: core.echo message="notify should be called"
output:
  - url: <% task(task1).result %>

actions/get_url.yaml
---
name: "get_url"
pack: "examples"
enabled: true
runner_type: "python-script"
entry_point: "get_url.py"
parameters:
  hostname:
    type: string
    required: true
import json
from st2common.runners.base_action import Action


class GetUrl(Action):
    def run(self, hostname=None):
        if hostname == "Extreme":
            return 'https://extreme.com'
        if hostname == "Google":
            return 'https://goodle.com'
        if hostname =="StackStorm":
            return 'https://StackStorm.com'

Execution result:

st2 run examples.orquesta-with-items-hostname
..
id: 5d66c4d20761294f53faac38
action.ref: examples.orquesta-with-items-hostname
parameters: None
status: failed
start_timestamp: Wed, 28 Aug 2019 18:15:46 UTC
end_timestamp: Wed, 28 Aug 2019 18:15:49 UTC
result:
  errors:
  - message: 'YaqlEvaluationException: Unable to resolve key ''result'' in expression ''<% task(task1).result.result = "https://extreme.com" %>'' from context.'
    route: 0
    task_id: task1
    task_transition_id: notify__t0
    type: error
  output:
    url:
      items:
      - result:
          exit_code: 0
          result: https://extreme.com
          stderr: ''
          stdout: ''
        status: succeeded
      - result:
          exit_code: 0
          result: https://goodle.com
          stderr: ''
          stdout: ''
        status: succeeded
      - result:
          exit_code: 0
          result: https://StackStorm.com
          stderr: ''
          stdout: ''
        status: succeeded
+--------------------------+------------------------+-------+------------------+--------------------+
| id                       | status                 | task  | action           | start_timestamp    |
+--------------------------+------------------------+-------+------------------+--------------------+
| 5d66c4d307612910be906f12 | succeeded (1s elapsed) | task1 | examples.get_url | Wed, 28 Aug 2019   |
|                          |                        |       |                  | 18:15:47 UTC       |
| 5d66c4d307612910be906f14 | succeeded (1s elapsed) | task1 | examples.get_url | Wed, 28 Aug 2019   |
|                          |                        |       |                  | 18:15:47 UTC       |
| 5d66c4d307612910be906f16 | succeeded (2s elapsed) | task1 | examples.get_url | Wed, 28 Aug 2019   |
|                          |                        |       |                  | 18:15:47 UTC       |
+--------------------------+------------------------+-------+------------------+--------------------+

Workflow succeeded before the join task is executed

Given the following workflow, where more than one tasks transition to this complete task regardless of success or failure, the workflow will succeed after task1 and task2 are complete and the complete task will not be executed. The join: all is not satisfied and the conductor thinks that there is no more task to execute.

version: 1.0

tasks:
  task1:
    action: core.noop
    next:
      - when: <% succeeded() %>
        do: complete
      - when: <% failed() %>
        do: complete

  task2:
    action: core.noop
    next:
      - when: <% succeeded() %>
        do: complete
      - when: <% failed() %>
        do: complete

  complete:
    join: all
    action: core.noop

Cleanup/rollback and fail workflow on task error

There is a common use case to launch cleanup/rollback and fail the workflow on task error.

Currently for the workflow below, it doesn't work. When task1 fails and the transition is evaluated, the fail command in the transition will immediately fail the workflow without running cleanup_task.

task1:
  action: mock.some_action_that_will_fail
  next:
    - when: <% succeeded() %>
       do:
          - cleanup_task
    - when: <% failed() %>
       do:
          - cleanup_task
          - fail

To workaround, users would have to do the following.

task1:
  action: mock.some_action_that_will_fail
  next:
    - when: <% succeeded() %>
       do:
          - cleanup_task
    - when: <% failed() %>
       do:
          - cleanup_task_on_failure

cleanup_task:
  action: mock.some_cleanup_action

cleanup_task_on_failure:
  action: mock.some_cleanup_action
  next:
    - do: fail

The above makes for a long winded approach. We want to launch the cleanup task first, set the workflow status to fail, and then let the cleanup task run to completion.

Jinja - 'referenced before assignment' with if/else blocks

Summary

When orquesta tries to graph a run of a workflow, it will error when a variable is referenced before its defined (yay!)
However, if the variable that isn't defined, is inside of a Jinja IF/ELSE block that wouldn't be selected anyway, it still errors.

This behavior is different from Mistral. I'm not sure if this is intentional, or what the preferred behavior should be. I could see arguments either way.

Reproduction

---
version: '1.0'
input:
  - payload
tasks:
  start:
    action: core.noop
    next:
      - when: "{{ ctx(payload).hostname is defined }}"
        do:
          - task1
        publish:
          - hostname: "{{ ctx(payload).hostname }}"
          - method: host
      - when: "{{ ctx(payload).client is defined }}"
        do:
          - task1
        publish:
          - client: "{{ ctx(payload).client }}"
          - method: client
  task1:
    action: core.local
    input:
      cmd: "echo '{% if ctx(method) == 'client' %}{{ ctx(client) }}{% elif ctx(method) == 'host' %}{{ ctx(hostname) }}{% endif %}'"

Input for payload

payload: {"hostname":"myhost"}

Expected results

My expectations were that this would work fine as it does in mistral, as this was discovered while I was migrating this Workflow from Mistral to Orquesta.

Observed Results

This WF fails to run because the graph fails to be built.

{
  "output": null,
  "errors": [
    {
      "language": "jinja",
      "type": "context",
      "spec_path": "tasks.task1.input",
      "schema_path": "properties.tasks.patternProperties.^\\w+$.properties.input",
      "message": "Variable \"client\" is referenced before assignment.",
      "expression": "echo '{% if ctx(method) == 'client' %}{{ ctx(client) }}{% elif ctx(method) == 'host' %}{{ ctx(hostname) }}{% endif %}'"
    },
    {
      "language": "jinja",
      "type": "context",
      "spec_path": "tasks.task1.input",
      "schema_path": "properties.tasks.patternProperties.^\\w+$.properties.input",
      "message": "Variable \"hostname\" is referenced before assignment.",
      "expression": "echo '{% if ctx(method) == 'client' %}{{ ctx(client) }}{% elif ctx(method) == 'host' %}{{ ctx(hostname) }}{% endif %}'"
    }
  ]
}

Workaround

  • Defining all possible vars that are conditionally created with null in the vars: section so that they exist, even if they won't be updated (publish) and used.
---
version: '1.0'
input:
  - payload
vars:
  - hostname: null
  - client: null
tasks:
   ...

Add something to force an action to succeed even when failed

https://docs.stackstorm.com/orquesta/languages/orquesta.html#engine-commands

One idea could be to add an additional option for pass.

Use Case:

sometimes actions complete gracefully but exit with a fail code. Sometimes this is because a resource wasn't found or some other reason.

For example: st2.kv.get

Sometimes you check for a key to exist. If this is the last action in a workflow, it can cause the workflow to complete as a fail even though it all other operations were success and you aren't actually concerned with this action failing. A work around I've used is having an action called its_ok_if_this_fails for core.noop. This allows the workflow to still complete as a success.

With an engine-command of pass, I wouldn't need this extra step, i could simply have

task1:
  action: core.noop
  next:
    - do: pass
      when: <% failed() %>

Where as currently i need

task1:
  action: core.noop
  next:
    - do: its_ok_if_this_fails
      when: <% failed() %>

its_ok_if_this_fails:
  action: core.noop

If something like this doesn't fit or work for some reason, I would be interested in any ideas that could help address the use case. Thank you!

Join ALL and conditional branches conflict?

I want to use conditions to determine which paths in the workflow to take, and finally use join all to summarize all the results that meet the criteria.
However, the final step did not execute .

Workflow stuck in running state

If the action is not listed syntactically correct the workflow will stay in a running state forever:
workflow:

version: 1.0
tasks:
  add_domains_to_ticket:
    with:
      items: "domain  in {{ctx().mitigate_domains }}"
      action: "core.echo message={{item(domain)}}"



input:
  - mitigate_domains
  - ticket: null

meta:

pack: intel
enabled: true
runner_type: orquesta
name: ticket_add_domains_mitigate
parameters:
  mitigate_domains:
    type: "array"
    items:
      type: "string"
  ticket:
    type: integer
entry_point: workflows/ticket_add_domains_mitigate.yaml

the fix is:

tasks: add_domains_to_ticket:
  with:
    items: "domain in {{ctx().mitigate_domains }}"
  action: "core.echo message={{item(domain)}}"

combined schema (anyOf, allOf, oneOf) error message enhancement

I think this goes here, but if it goes on StackStorm/st2, I can file the issue there.
This issue is about improving the error message returned when a schema validation error occurs due to not matching a oneOf schema.

I ran an orquesta workflow and for the task transition I used the incorrect:

publish:
  message: foobar

instead of the correct:

publish:
  - message: foobar

Doing that gave me an error message like:

    {
      "spec_path": "tasks.validate_input.next[0].publish",
      "message": "{'error_message': 'wrong format for input param foobar: {{ ctx().foobar }}'} is not valid under any of the given schemas",
      "type": "syntax",
      "schema_path": "properties.tasks.patternProperties.^\\w+$.properties.next.items.properties.publish.oneOf"
    },

Then, I went to look up the schema to figure out what I did wrong, mistakenly guessing that it didn't like my use of Jinja. My error was passing in a dict instead of an array. Could the error message be improved to say what the options are when the schemaPath is oneOf? (in this case string or array of unique one item dicts)

Inflated workflow execution graph leads to increase resource consumption and delays

Given the workflow definition at https://github.com/StackStorm/st2ci/blob/86655cb71ae0a1ff813b4bd0828399b54206e146/actions/workflows/st2_pkg_e2e_test.yaml, orquesta converts this workflow into an execution graph for runtime. Due to the number of conditional jumps, multiple references to common tasks, and common tasks having many steps, this leads to an inflated execution graph. This results in very high CPU and memory usage and long delayed when the conductor tries to load the workflow definition. The resource consumption and delay are likely occurring where the conductor is trying to identify start tasks for the graph.

Next task tasks skipped on failure of one of the parallel tasks under current task. Why?

Actual behavior

image

Expected behavior

image

transition_ticket has no terminate/fail action

image

Stackstorm workflow output:

{
  "output": {
    "stdout": "Finished!!!"
  },
  "errors": [
    {
      "message": "Execution failed. See result for details.",
      "type": "error",
      "result": {
        "stdout": "",
        "result": "JiraError HTTP Invalid transition name. None
\t",
        "stderr": "",
        "exit_code": 0
      },
      "task_id": "transition_ticket"
    }
  ]
}

Qrquesta sub-workflow tasks do not quickly continue after completing

Qrquesta main workflow calls out to a Qrquesta sub-workflow. The sub-workflow executes successfully but does not quickly return.

date

Fri Mar 29 22:28:52 CDT 2019

cat /etc/redhat-release

CentOS Linux release 7.6.1810 (Core)

st2 --version

st2 3.0dev (c592795), on Python 2.7.5

cat /etc/st2/st2.conf

mode = standalone
[coordination]
url = redis://:some-pass@localhost:6379

Redis enabled for workflowengine coordination.

redis-server --version

Redis server v=3.2.12 sha=00000000:0 malloc=jemalloc-3.6.0 bits=64 build=7897e7d0e13773f

The sub-workflow would normally finish execution in 15-30 seconds when called directly via st2 run.

All sub-workflow tasks exceed 100 seconds.

  • 5c9ec7b0da6c0b4fdb2f1a25 | succeeded (223s elapsed) | fetch_server_hardware_cpu | easydcim.show_device_parts | Sat, 30 Mar 2019 01:34:40 UTC
  • 5c9ec7b5da6c0b4fdb2f1a29 | succeeded (185s elapsed) | fetch_server_hardware_hdd | easydcim.show_device_parts | Sat, 30 Mar 2019 01:34:45 UTC
  • 5c9ec7bbda6c0b4fdb2f1a2f | succeeded (234s elapsed) | fetch_server_hardware_ram | easydcim.show_device_parts | Sat, 30 Mar 2019 01:34:50 UTC
  • 5c9ec7c3da6c0b4fdb2f1a35 | succeeded (265s elapsed) | fetch_server_hardware_san | easydcim.show_device_parts | Sat, 30 Mar 2019 01:34:58 UTC
  • 5c9ec7ccda6c0b4fdb2f1a3e | succeeded (242s elapsed) | fetch_server_hardware_ssd | easydcim.show_device_parts | Sat, 30 Mar 2019 01:35:07 UTC

vs

st2 run easydcim.show_device_parts id_num=13 type_id=20

...........
id: 5c9ed02bda6c0b66f7876196
action.ref: easydcim.show_device_parts
parameters:
id_num: 13
type_id: 20
status: succeeded
start_timestamp: Sat, 30 Mar 2019 02:10:51 UTC
end_timestamp: Sat, 30 Mar 2019 02:11:14 UTC
result:
output:

+--------------------------+------------------------+------------------+---------------------+-------------------------------+
| id | status | task | action | start_timestamp |
+--------------------------+------------------------+------------------+---------------------+-------------------------------+
| 5c9ed02cda6c0b66cf009dc3 | succeeded (2s elapsed) | fetch_item_list | easydcim.list_items | Sat, 30 Mar 2019 02:10:52 UTC |
| 5c9ed02fda6c0b66cf009dc6 | succeeded (3s elapsed) | fetch_hw_item | easydcim.show_item | Sat, 30 Mar 2019 02:10:55 UTC |
| 5c9ed030da6c0b66cf009dc8 | succeeded (2s elapsed) | fetch_hw_item | easydcim.show_item | Sat, 30 Mar 2019 02:10:56 UTC |
| 5c9ed031da6c0b66cf009dca | succeeded (2s elapsed) | fetch_hw_item | easydcim.show_item | Sat, 30 Mar 2019 02:10:57 UTC |
| 5c9ed031da6c0b66cf009dcd | succeeded (1s elapsed) | notify | core.echo | Sat, 30 Mar 2019 02:10:57 UTC |
| 5c9ed036da6c0b66cf009dd0 | succeeded (1s elapsed) | close_request | core.noop | Sat, 30 Mar 2019 02:11:02 UTC |
| 5c9ed038da6c0b66cf009dd3 | succeeded (1s elapsed) | notify | core.echo | Sat, 30 Mar 2019 02:11:04 UTC |
| 5c9ed039da6c0b66cf009dd6 | succeeded (1s elapsed) | print_hw_item_id | core.echo | Sat, 30 Mar 2019 02:11:05 UTC |
| 5c9ed03cda6c0b66cf009dd9 | succeeded (1s elapsed) | notify | core.echo | Sat, 30 Mar 2019 02:11:08 UTC |
+--------------------------+------------------------+------------------+---------------------+-------------------------------+

401 Client Error: Unauthorized MESSAGE: Unauthorized - One of Token or API key required.

I am getting a

requests.exceptions.HTTPError: 401 Client Error: Unauthorized
MESSAGE: Unauthorized - One of Token or API key required. for url: http://127.0.0.1:9101/v1/keys/<keyname>

exception when running an action in an orquesta workflow.

If I execute the action outside of the orquesta workflow or in a mistral-v2 workflow it works as expected.

The action is calling: self.action_service.set_key and self.action_service.get_key

It seems like this is failing due to the api token not being present in the call from orquesta.

"The join task is unreachable", but it should be

Hi,

Let's consider this workflow:

version: 1.0

input:
  - max_retry_count
  - retry_delay_sec

vars:
  - SOME_MSG: some msg
  - retry_count: 1

tasks:
  task_1:
    action: core.noop
    next:
      - when: <% succeeded() %>
        do:
          - task_2

  task_2:
    action: core.local
    input:
      cmd: '>&2 echo <% ctx(SOME_MSG) %> ; false'
    next:
      - when: <% failed() and ctx(SOME_MSG) in result().stderr %>
        do:
          - task_3
      - when: <% failed() and (not (ctx(SOME_MSG) in result().stderr)) and ctx(retry_count) < ctx(max_retry_count) %>
        publish:
          - retry_count: <% ctx(retry_count) + 1 %>
        do:
          - task_3
      - when: <% failed() and (not (ctx(SOME_MSG) in result().stderr)) and ctx(retry_count) >= ctx(max_retry_count) %>
        do:
          - task_6a
          - task_6b
      - when: <% succeeded() %>
        do:
          - task_5

  task_3:
    action: core.local
    input:
      cmd: 'false'
    next:
      - when: <% succeeded() %>
        do:
          - task_5
      - when: <% failed() %>
        do:
          - task_6a
          - task_6b

  task_4:
    delay: <% ctx(retry_delay_sec) %>
    action: core.noop
    next:
      - when: <% succeeded() or failed() %>
        do:
          - task_2

  task_5:
    action: core.noop

  task_6a:
    action: core.noop
    next:
      - when: <% succeeded() %>
        do:
          - task_7

  task_6b:
    action: core.noop
    next:
      - when: <% succeeded() %>
        do:
          - task_7

  task_7:
    join: 2
    action: core.local
    input:
      cmd: 'false'

Because the values are hard-coded, the workflow should always do:

  1. task_1
  2. task_2
  3. task_3
  4. task_6a and task_6b in parallel
  5. join: task_7

However, when running this workflow, I have the following error:

{
  "output": null,
  "errors": [
    {
      "spec_path": "tasks.task_7",
      "message": "The join task \"task_7\" is unreachable.",
      "type": "semantic",
      "schema_path": "properties.tasks.patternProperties.^\\w+$"
    }
  ]
}

Note that if I remove the 3rd when of task_2 (i.e. - when: <% failed() and (not (ctx(SOME_MSG) in result().stderr)) and ctx(retry_count) >= ctx(max_retry_count) %>), the error disappears and the workflow works.

Any idea what's the problem here?

Thanks!

Incomplete next staged concurrent task with items if last running nested item fails.

The below workflow didn't execute Nested6 for subsequent items in listed, if it fails for any item in listed. Basically if last running concurrent nested task is failed than it marks the staged task as completed and didn't pick the next task from staged items.

I have proposed a change which worked for me in a PR #221. Where I have update the TASK_STATE_MACHINE_DATA map for current state to new state for event events.ACTION_FAILED_TASK_DORMANT_ITEMS_INCOMPLETE: statuses.RUNNING.

---
version: "1.0"
description: "Workflow desc"
input:
- "nested"
- "break_on"
- "l_c"
- "listed"
tasks:
  Start:
    action: "core.noop"
    next:
    - when: "<% succeeded() %>"
      do: "Nested6"

  Nested6:
    action: "core.local cmd='exit 1'"
    input:
      b: "<% item(list_i) %>"
      l_c: "<% ctx().l_c %>"
    next:
    - when: "<% succeeded() %>"
      do: "Join11"
    - when: "<% failed() %>"
      do: "Join12"
    with:
      items: list_i in <% ctx(listed) %>
      concurrency: 1

  Join11:
    action: "core.noop"
    next:
    - when: "<% failed() %>"
      do: "fail"
    join: "all"
  Join12:
    action: "core.noop"
    next:
    - when: "<% failed() %>"
      do: "fail"
    join: "all"

Exception running workflow - 'ValueError: malformed node or string: <_ast.BinOp object at 0x7f80d29946d8>'

I haven't been able to find where this error is being generated from. As far as I can tell the action follows format and syntax correctly.

Action from workflow that is failing

  send_maintenance_start:
    action: email.send_email
    input:
      account: smarthost
      email_from: "no-reply@XXXXXX"
      email_to: "carlos@XXXXXX"
      subject: Incident detected on <% ctx("hostname") %>
      message: Automatic reboot of <% ctx("hostname") %> to recover from host down.
    next:
      - do: disable_users

Exception from workflowegine log.

2020-01-17 23:33:43,532 140191264317064 INFO (unknown file) [-] [5e22445438b7e59d28ade4fb] Mark task "send_maintenance_start", route "0", in conductor as running.
2020-01-17 23:33:43,580 140191264317064 INFO (unknown file) [-] [5e22445438b7e59d28ade4fb] Requesting execution for task "send_maintenance_start", route "0".
2020-01-17 23:33:43,580 140191264317064 INFO (unknown file) [-] [5e22445438b7e59d28ade4fb] Processing task execution request for task "send_maintenance_start", route "0".
2020-01-17 23:33:43,589 140191264317064 INFO (unknown file) [-] [5e22445438b7e59d28ade4fb] Task execution "5e22445738b7e59c9835e3ff" created for task "send_maintenance_start
", route "0".
2020-01-17 23:33:43,599 140191264317064 WARNING (unknown file) [-] Determining if exception <class 'ValueError'> should be retried.
2020-01-17 23:33:43,599 140191264317064 WARNING (unknown file) [-] Determining if exception <class 'ValueError'> should be retried.
2020-01-17 23:33:43,599 140191264317064 ERROR (unknown file) [-] [5e22445438b7e59d28ade4fb] Failed action execution(s) for task "send_maintenance_start", route "0". malforme
d node or string: <_ast.BinOp object at 0x7f80d29946d8>
Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python3.6/site-packages/st2common/util/casts.py", line 36, in _cast_object
    return json.loads(x)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Duplicate task name should throw error

Tested with st2 3.0dev (d86747c), on Python 2.7.12

Orquesta workflows with duplicate task names do not throw an error. They proceed, with what looks like the second task with the same name being called.

I think it should throw an error if duplicate task names are detected.

Example workflow metadata:

---
name: orquesta-dupe
pack: default
description: Duplicate tasks
runner_type: orquesta
entry_point: workflows/orquesta-dupe.yaml
enabled: true

Workflow:

version: 1.0

description: A basic workflow with duplicate task names

tasks:
  task1:
    action: core.local cmd="echo task1"
    next:
      - when: <% succeeded() %>
        do: task2
  task2:
    action: core.local cmd='echo "first task2"'
    next:
      - when: <% succeeded() %>
        do: task3
  task2:
    action: core.local cmd='echo "second task2"'

After registration, inspect the workflow to check for errors:

extreme@ubuntu:/opt/stackstorm/packs$ st2 workflow inspect --action default.orquesta-dupe
No errors found in workflow definition.
extreme@ubuntu:/opt/stackstorm/packs$

Run workflow, look at output:

extreme@ubuntu:/opt/stackstorm/packs$ st2 run default.orquesta-dupe
...
id: 5ca3ff955f627d08b84be500
action.ref: default.orquesta-dupe
parameters: None
status: succeeded
start_timestamp: Wed, 03 Apr 2019 00:34:28 UTC
end_timestamp: Wed, 03 Apr 2019 00:34:34 UTC
result:
  output: null
+--------------------------+------------------------+-------+------------+------------------------------+
| id                       | status                 | task  | action     | start_timestamp              |
+--------------------------+------------------------+-------+------------+------------------------------+
| 5ca3ff975f627d0453668034 | succeeded (1s elapsed) | task1 | core.local | Wed, 03 Apr 2019 00:34:31    |
|                          |                        |       |            | UTC                          |
| 5ca3ff995f627d0453668037 | succeeded (0s elapsed) | task2 | core.local | Wed, 03 Apr 2019 00:34:33    |
|                          |                        |       |            | UTC                          |
+--------------------------+------------------------+-------+------------+------------------------------+
extreme@ubuntu:/opt/stackstorm/packs$ st2 execution get 5ca3ff995f627d0453668037
id: 5ca3ff995f627d0453668037
status: succeeded (0s elapsed)
parameters:
  cmd: echo "second task2"
result:
  failed: false
  return_code: 0
  stderr: ''
  stdout: second task2
  succeeded: true
extreme@ubuntu:/opt/stackstorm/packs$

Note that it was the second task2 that was called.

decrypt_kv jinja filter fails if the key isn't in the kv store

An action file that has a secret parameter with a default value as follows:

action_param:
  type: string
  description: "This will fail the action"
  default: "{{ st2kv.system.test_param | decrypt_kv }}"
  secret: true

will fail with the following message if 'test_param' is not in the kv store:

ERROR: 400 Client Error: Bad Request
MESSAGE: Failed to render parameter "test_param": Referenced datastore item "st2kv.system.test_param" doesn't exist or it contains an empty string

Alternatively, if I pass a value that doesn't exist in the kv store but I don't try to decrypt it, then a blank string gets passed into the action parameter. I would expect an encrypted value to pass a blank string as well instead of failing.

Workflow fails unexpectedly (repro included)

SUMMARY

A workflow that we expect to pause is failing.

STACKSTORM VERSION

root@b7e90889c25a:/# st2 --version
st2 3.1.0, on Python 2.7.6

OS, environment, install method

Docker

Steps to reproduce the problem

here is the workflow definition
image

Install and run the workflow:

root@b7e90889c25a:/# st2 pack install https://github.com/trstruth/bug

        [ succeeded ] download pack
        [ succeeded ] make a prerun
        [ succeeded ] install pack dependencies
        [ succeeded ] register pack

+-------------+-----------------------------------------------+
| Property    | Value                                         |
+-------------+-----------------------------------------------+
| name        | Bug                                           |
| description | A pack of workflow repros for st2 bug reports |
| version     | 0.1.0                                         |
| author      | Tristan                                       |
+-------------+-----------------------------------------------+
root@b7e90889c25a:/# st2 run bug.branch-failure
..
id: 5d51c279cad05a01403cc430
action.ref: bug.branch-failure
parameters: None
status: failed
start_timestamp: Mon, 12 Aug 2019 19:48:09 UTC
end_timestamp: Mon, 12 Aug 2019 19:48:12 UTC
result:
  errors:
  - message: Execution failed. See result for details.
    result:
      failed: true
      return_code: 127
      stderr: 'bash: asdf: command not found'
      stdout: ''
      succeeded: false
    task_id: t2
    type: error
  - message: Execution failed. See result for details.
    result:
      failed: true
      return_code: 127
      stderr: 'bash: asdf: command not found'
      stdout: ''
      succeeded: false
    task_id: t3
    type: error
  output: null
+--------------------------+------------------------+------+------------+-------------------------------+
| id                       | status                 | task | action     | start_timestamp               |
+--------------------------+------------------------+------+------------+-------------------------------+
| 5d51c279cad05a0039b31e91 | succeeded (1s elapsed) | t1   | core.noop  | Mon, 12 Aug 2019 19:48:09 UTC |
| 5d51c27acad05a0039b31e94 | failed (1s elapsed)    | t2   | core.local | Mon, 12 Aug 2019 19:48:10 UTC |
| 5d51c27acad05a0039b31e97 | failed (1s elapsed)    | t3   | core.local | Mon, 12 Aug 2019 19:48:10 UTC |
+--------------------------+------------------------+------+------------+-------------------------------+

Expected Results

Since t3 failing should trigger an inquiry, the failure of the workflow is unexpected.

Actual Results

The workflow fails.

Rerunning `examples.orchestra-join` results in a failure

vagrant@st2test:~$ st2 run examples.orchestra-join
...
id: 5b332ce055fc8c57d28831af
action.ref: examples.orchestra-join
parameters: None
status: succeeded
start_timestamp: Wed, 27 Jun 2018 06:21:20 UTC
end_timestamp: Wed, 27 Jun 2018 06:21:24 UTC
result:
  output:
    messages:
    - Fee fi fo fum
    - I smell the blood of an English man
    - Be alive, or be he dead
    - I'll grind his bones to make my bread
+--------------------------+------------------------+--------+-----------+-------------------------------+
| id                       | status                 | task   | action    | start_timestamp               |
+--------------------------+------------------------+--------+-----------+-------------------------------+
| 5b332ce055fc8c58798f6a20 | succeeded (0s elapsed) | task1  | core.noop | Wed, 27 Jun 2018 06:21:20 UTC |
| 5b332ce055fc8c5821391bd2 | succeeded (1s elapsed) | task2  | core.echo | Wed, 27 Jun 2018 06:21:20 UTC |
| 5b332ce155fc8c5821391bd5 | succeeded (0s elapsed) | task4  | core.echo | Wed, 27 Jun 2018 06:21:21 UTC |
| 5b332ce155fc8c5821391bd8 | succeeded (0s elapsed) | task6  | core.noop | Wed, 27 Jun 2018 06:21:21 UTC |
| 5b332ce255fc8c5821391bdc | succeeded (1s elapsed) | task5  | core.noop | Wed, 27 Jun 2018 06:21:21 UTC |
| 5b332ce255fc8c5821391bdf | succeeded (0s elapsed) | task3  | core.echo | Wed, 27 Jun 2018 06:21:22 UTC |
| 5b332ce255fc8c5821391be2 | succeeded (0s elapsed) | task7  | core.echo | Wed, 27 Jun 2018 06:21:22 UTC |
| 5b332ce355fc8c5821391be4 | succeeded (1s elapsed) | task8  | core.noop | Wed, 27 Jun 2018 06:21:22 UTC |
| 5b332ce355fc8c5821391be7 | succeeded (1s elapsed) | task9  | core.noop | Wed, 27 Jun 2018 06:21:23 UTC |
| 5b332ce455fc8c5821391bea | succeeded (0s elapsed) | task10 | core.noop | Wed, 27 Jun 2018 06:21:24 UTC |
+--------------------------+------------------------+--------+-----------+-------------------------------+
vagrant@st2test:~$ st2 execution re-run 5b332ce055fc8c57d28831af
..
id: 5b332cf155fc8c57d28831b2
action.ref: examples.orchestra-join
parameters: None
status: failed
start_timestamp: Wed, 27 Jun 2018 06:21:37 UTC
end_timestamp: Wed, 27 Jun 2018 06:21:40 UTC
result:
  errors:
  - message: Conflict saving DB object with id "5b332cf255fc8c5821391bf4" and rev "2".
    task_id: task8
  output: null
+--------------------------+------------------------+-------+-----------+-------------------------------+
| id                       | status                 | task  | action    | start_timestamp               |
+--------------------------+------------------------+-------+-----------+-------------------------------+
| 5b332cf155fc8c58798f6a23 | succeeded (0s elapsed) | task1 | core.noop | Wed, 27 Jun 2018 06:21:37 UTC |
| 5b332cf255fc8c5821391bed | succeeded (0s elapsed) | task2 | core.echo | Wed, 27 Jun 2018 06:21:38 UTC |
| 5b332cf255fc8c5821391bf0 | succeeded (0s elapsed) | task4 | core.echo | Wed, 27 Jun 2018 06:21:38 UTC |
| 5b332cf255fc8c5821391bf3 | succeeded (0s elapsed) | task6 | core.noop | Wed, 27 Jun 2018 06:21:38 UTC |
| 5b332cf355fc8c5821391bf6 | succeeded (0s elapsed) | task8 | core.noop | Wed, 27 Jun 2018 06:21:39 UTC |
| 5b332cf355fc8c5821391bfa | succeeded (0s elapsed) | task5 | core.noop | Wed, 27 Jun 2018 06:21:39 UTC |
| 5b332cf455fc8c5821391c01 | succeeded (2s elapsed) | task3 | core.echo | Wed, 27 Jun 2018 06:21:39 UTC |
| 5b332cf355fc8c5821391bfd | succeeded (1s elapsed) | task7 | core.echo | Wed, 27 Jun 2018 06:21:39 UTC |
| 5b332cf455fc8c5821391c02 | requested              | task9 | core.noop | Wed, 27 Jun 2018 06:21:40 UTC |
+--------------------------+------------------------+-------+-----------+-------------------------------+

Join failure within nested workflows can cause Parent workflow to run indefinitely.

Summary

Using a nested workflow, when a join fails due to "unreachable" in the child workflow can cause the parent workflow to run indefinitely, even though the parent workflow reaches an acceptable completion point.

Error Messages

I've seen two cases of error messages when this scenario presents

"message": "UnreachableJoinError: The join task|route \"aggregate|1\" is partially satisfied but unreachable."
"message": "The join task \"aggregate\" is unreachable. A join task is determined to be unreachable if there are nested forks from multi-referenced tasks that join on the said task. This is ambiguous to the workflow engine because it does not know at which level should the join occurs.",

The longest I've seen it go, was until I manually canceled it the following day at 69,552 seconds (over 19 hours)

image

Environment details

ST2 Version
st2 --version
st2 3.2.0, on Python 2.7.12
Distro
cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
Other
  • Installed from one-liner script
  • Using EC2 server, not Docker or other virtualization
  • Running on AMI ami-0e32ec5bc225539f5 in AWS

Reproduction Workflow Examples

I have tested with these reproduction workflows that the problem presents itself.

Parent WF

parent_wf.meta.yaml
pack: default
enabled: true
runner_type: orquesta
name: parent_wf
entry_point: workflows/parent_wf.yaml
parent_wf.yaml
version: 1.0
tasks:
  # [483, 337]
  task1:
    action: default.child_wf
    with:
      items: <% ctx(hosts) %>
      concurrency: 3
    next:
      - do:
          - complete
  # [483, 486]
  complete:
    action: core.noop
    join: all
vars:
  - hosts: ["host1","host2","host3"]

Child WF

child_wf.meta.yaml
pack: default
enabled: true
runner_type: orquesta
name: child_wf
entry_point: workflows/child_wf.yaml
child_wf.yaml
version: 1.0
tasks:
  # [489, 163]
  run:
    action: core.noop
    next:
      - do:
          - succeeds
          - fails
  # [348, 313]
  succeeds:
    action: core.local
    input:
      cmd: echo 'success'
    next:
      # #1072c6
      - do:
          - aggregate

  # [666, 311]
  fails:
    action: core.local
    input:
      cmd: echo 'fail'; exit 1
    next:
      # #1072c6
      - do:
          - aggregate

  # [518, 461]
  aggregate:
    action: core.noop
    join: all
    next:
      # #629e47
      - do:
          - continue_wf

  # [518, 593]
  continue_wf:
    action: core.noop

Expected Result

  • Child workflow join fails because upstream action failure
  • Parent Workflow sees failure of child workflow
  • Parent Workflow waits for all child workflows to complete
  • Parent workflow moves onto complete action
  • Parent workflow enters Success/Failed State accordingly

Observed Result

  • Child workflow join fails because upstream action failure
  • Parent Workflow sees failure of child workflow
  • Parent Workflow waits for all child workflows to complete
  • Parent workflow moves onto complete action
  • Parent workflow continues in running State until canceled manually

Screen Shot 2020-08-07 at 3 39 27 PM

Workaround

An acceptable workaround I have found is ensuring that each parallel silo (fork) of the child workflow, prior to being joined, has a core.noop to ensure that a success always happens, which allows the join to succeed and continue gracefully.

This causes the "Expected Result" to be observed.

Nesting When statements in Orquesta

Right now when writing check logic in orquesta I have to have a when block for each conditional. It's not graceful and harder to manage.
For ex:
Instead of ,

When ( succeeded)
      When ( x=true)
           Do thing
       When (y= true)
            Do thang

We have to do

When ( succeeded and x=true)
   do thing
When (succeeded and y = true)
   do thang

It's not clean and harder to read/maintain. Especially in some of our workflows the conditionals are large and you end up having many statements with repeated keywords in them.

Workflow execution with error sometimes

SUMMARY

I have a workflow running by a rule with interval timer that the first task iterates an array using with-item, sometimes I get the error "TypeError: 'NoneType' object has no attribute 'getitem'". The stranger is that doesn't occur in every execution.

STACKSTORM VERSION

st2 3.0.1, on Python 2.7.16

OS, environment, install method

OS: Oracle Linux 7.5
Custom HA installation with ST2 RPM without changes. All services separate as a single VM.

Steps to reproduce the problem

Workflow 1 content:

description: Simple workflow that check VM interface
input:
- lbs
- env_name
- acs_url
tasks:
    each_vip:
        with: <% ctx(lbs) %>
        action: workflows.check-interface-vm
        input:
            lb_id: <% item() %>
            env_name: <% ctx(env_name) %> 
            acs_url: <% ctx(acs_url) %>

Subworkflow

enabled: true
name: check-interface-vm
description: "Subworkflow"
pack: workflows
runner_type: orquesta
parameters:
  lb_id:
    required: true
    type: string
    description: "LB ID."
    position: 0
  env_name:
    required: true
    type: string
    description: "Env."
    position: 1
  acs_url:
    required: false
    type: string
    description: "URL ACS."
    position: 2
entry_point: workflows/check-interface-vm.yaml
data_files:
- file_path: workflows/check-interface-vm.yaml
  content: |
    version: 1.0
    description: "Subworkflow."
    
    input:
      - lb_id
      - env_name
      - acs_url

    tasks:
      get_lb_info:
        action: cloudstack.lb_list_members
        input:
          lb_id: <% ctx(lb_id) %>
          url: <% ctx(acs_url) %>
          apikey: <% st2kv('acs_apikey', decrypt => true) %>
          secretkey: <% st2kv('acs_secretkey', decrypt => true) %>
        next:
          - when: <% succeeded() %>
            publish:
              - lb_info: <% result().result.loadbalancerruleinstance %>
            do: get_vm_info
          - when: <% failed() %>
            do: fail

      get_vm_info:
        with: <% ctx(lb_info) %>
        action: cloudstack.vm_get_info
        input:
          vm_id: <% item().id %>
          url: <% ctx(acs_url) %>
          apikey: <% st2kv('acs_apikey', decrypt => true) %>
          secretkey: <% st2kv('acs_secretkey', decrypt => true) %>
        next:
          - when: <% succeeded() %>
            publish:
              - vm_info: <% result().result.virtualmachine %>
            do: ping_vm
          - when: <% failed() %>
            do: fail

      ping_vm:
        with: <% ctx(vm_info).select($.nic.ipaddress) %>
        action: networking_utils.ping
        input:
          host: "<% item().first().last() %>"
          count: 10
        next:
          - when: <% failed() %>
            publish:  
              - vm_tested: <% result().where($.failed = true).stdout.select($.split(' ')[1]) %>
            do: delete_vm_failed_interface

      delete_vm_failed_interface:
        with: <% ctx(vm_info).where(ctx(vm_tested).contains($.nic.ipaddress.first().last())) %>
        action: cloudstack.vm_force_destroy
        input:
          vm_id: <% item().first().id %>
          url: <% ctx(acs_url) %>
          apikey: <% st2kv('acs_apikey', decrypt => true) %>
          secretkey: <% st2kv('acs_secretkey', decrypt => true) %>
        next:
          - when: <% succeeded() %>
            do: post_slack_deleted
          - when: <% failed() %>
            do: fail

      post_slack_deleted:
        with: <% ctx(vm_info).where(ctx(vm_tested).contains($.nic.ipaddress.first().last())) %>
        action: slack.chat.postMessage
        input:
          text: "[CHECK INTERFACE - `<% ctx(env_name).toUpper() %>`] *<% item().first().name %>* *DOWN*."
          http_method: "POST"
          username: "bot"
          channel: "channel"

Expected Results

All workflows and actions executed with success.

Actual Results

Sometimes all actions inside workflow are executed with success but workflow fails with the message: TypeError: 'NoneType' object has no attribute '__getitem__'

Screen Shot 2019-09-03 at 5 01 04 PM

Looking at logs I can see the error message with some identification:

ERROR 140336294574032 workflows [-] [5d6ed69d4341ee7ea9f75a6f] Unknown error
while processing task execution. Failing task execution "5d6ed69f1b80fa05b9fa7d07"

And when I try to get task execution information:

$ st2 execution get 5d6ed69f1b80fa05b9fa7d07 
ERROR: Execution with id 5d6ed69f1b80fa05b9fa7d07 not found.

Getting workflow information I can't see these task execution that failed 5d6ed69f1b80fa05b9fa7d07:

id: 5d6ed69d4341ee7ea9f75a6f
action.ref: workflows.check-farm
parameters:
  env_name: DEV
  lbs:
  - 6c0c9e83-21d4-455d-9c01-e7ecdfc90cf4
  - 715f34af-3a96-4aa4-aac4-2c04baa314d4
status: failed (18s elapsed)
start_timestamp: Tue, 03 Sep 2019 21:09:49 UTC
end_timestamp: Tue, 03 Sep 2019 21:10:07 UTC
result:
  errors:
  - message: Execution failed. See result for details.
    task_id: each_vip
    type: error
  - message: 'TypeError: ''NoneType'' object has no attribute ''__getitem__'''
    route: 0
    task_id: each_vip
    type: error
  output:
    each_vip:
      items:
      - result:
          output: null
        status: succeeded
      - result:
          output: null
        status: succeeded
    finish:
      failed: false
      return_code: 0
      stderr: ''
      stdout: Completed
      succeeded: true
    start:
      failed: false
      return_code: 0
      stderr: ''
      stdout: Started checking [u'6c0c9e83-21d4-455d-9c01-e7ecdfc90cf4', u'715f34af-3a96-4aa4-aac4-2c04baa314d4']
      succeeded: true
+-----------------------------+-------------------------+-------------+------------------------------------+-------------------------------+
| id                          | status                  | task        | action                             | start_timestamp               |
+-----------------------------+-------------------------+-------------+------------------------------------+-------------------------------+
|   5d6ed69e1b80fa05b9fa7d06  | succeeded (1s elapsed)  | start       | core.echo                          | Tue, 03 Sep 2019 21:09:50 UTC |
| + 5d6ed69f1b80fa05b9fa7d09  | succeeded (15s elapsed) | each_vip    | workflows.check-interface-vm | Tue, 03 Sep 2019 21:09:51 UTC |
|    5d6ed6a01b80fa05b9fa7d0e | succeeded (2s elapsed)  | get_lb_info | libcloud.balancer_list_members     | Tue, 03 Sep 2019 21:09:52 UTC |
|    5d6ed6a31b80fa05b9fa7d15 | succeeded (1s elapsed)  | get_vm_info | libcloud.get_vm                    | Tue, 03 Sep 2019 21:09:55 UTC |
|    5d6ed6a41b80fa05b9fa7d19 | succeeded (1s elapsed)  | get_vm_info | libcloud.get_vm                    | Tue, 03 Sep 2019 21:09:56 UTC |
|    5d6ed6a71b80fa05b9fa7d1f | succeeded (5s elapsed)  | ping_vm     | networking_utils.ping              | Tue, 03 Sep 2019 21:09:58 UTC |
|    5d6ed6a71b80fa05b9fa7d21 | succeeded (5s elapsed)  | ping_vm     | networking_utils.ping              | Tue, 03 Sep 2019 21:09:59 UTC |
| + 5d6ed69f1b80fa05b9fa7d0b  | succeeded (13s elapsed) | each_vip    | workflows.check-interface-vm | Tue, 03 Sep 2019 21:09:51 UTC |
|    5d6ed6a11b80fa05b9fa7d11 | succeeded (1s elapsed)  | get_lb_info | libcloud.balancer_list_members     | Tue, 03 Sep 2019 21:09:53 UTC |
|    5d6ed6a31b80fa05b9fa7d17 | succeeded (2s elapsed)  | get_vm_info | libcloud.get_vm                    | Tue, 03 Sep 2019 21:09:55 UTC |
|    5d6ed6a61b80fa05b9fa7d1c | succeeded (5s elapsed)  | ping_vm     | networking_utils.ping              | Tue, 03 Sep 2019 21:09:58 UTC |
|   5d6ed6ae1b80fa05b9fa7d24  | succeeded (1s elapsed)  | finish      | core.echo                          | Tue, 03 Sep 2019 21:10:06 UTC |
+-----------------------------+-------------------------+-------------+------------------------------------+-------------------------------+

If has any other information that I can pass, please let me know.

Substring literal "in" breaks with items syntax

SUMMARY

The statement with: interface in <% list("1","2","3") %> fails because the string interface contains the substring in. The resulting error message is

TypeError: The value of "terface in <% list("1","2","3") %>" is not type of list.

I this this happens because it is breaking the regex here

I've created a repro workflow here

ISSUE TYPE
  • Bug Report
STACKSTORM VERSION

st2 2.10.1, on Python 2.7.6

OS / ENVIRONMENT / INSTALL METHOD

default docker container

Task Retry in Workflow Definition

SUMMARY

When orquesta workflow tasks fail, there should be a built-in syntax solution for retrying. Today our production workflows make use of a combination of decrementing then publishing retry counter var, and before routing to the task again. Retry is listed as an upcoming feature in the documentation and I'm interested in helping with its implementation.

ISSUE TYPE
  • Feature Idea

Tasks in a fork from a cycle do not execute

Given the following workflow, the task cheer will only execute once if the task toil cycle back to the task query. The workflow engine did not recognize the fork under decide_cheer to be part of the cycle and blocked the task from updating the workflow state.

version: '1.0'

description: A sample workflow with a fork that stems from a cycle.

vars:
  - cheer: false
  - work: false

tasks:
  # This init task is required to tell the workflow engine where to start.
  # Otherwise with just a cycle, the workflow engine cannot tell where to start.
  init:
    action: core.noop
    next:
      - do: query

  # This is the start of the cycle in the workflow.
  query:
    action: core.noop
    next:
      - when: <% succeeded() %>
        publish: cheer=<% result() %> work=<% result() %>
        do: decide_cheer, decide_work


  # The branch under the decide_work task will cycle back to the query task.
  decide_work:
    next:
      - when: <% ctx().work %>
        do: notify_work, toil
  notify_work:
    action: core.noop
  toil:
    action: core.echo message="This is hard work."
    next:
      - do: query


  # The branch under the decide_cheer task is a fork from the cycle.
  decide_cheer:
    next:
      - when: <% ctx().cheer %>
        do: cheer
  cheer:
    action: core.echo message="You can do it!"

User feedback

Below is user feedback via Slack. Posting it here so it doesn't get lost with Slack message rollover

StackStorm's New Workflows

Unit Testing

Currently, in my opinion, the biggest downfall of Mistral is the lack of unit
testing around workflows. Without unit testing we're relying on smoke testing to
ensure these workflows behave properly (not very effective or maintainable).

Things we would like to test in a workflow include:

  • Task execution yes/no
  • Task execution order
  • Error handling
  • Publish expressions (Jinja, YAQL)
  • Transition expressions (Jinja, YAQL)
  • Workflow inputs and outputs

I think this would require some sort of ability to mock action output so that
these things can be tested.

Publish and reuse in same task

Currently, in Mistral, all of the publish and publish-on-error statements
occur in bulk meaning i can't reuse one of the variables published in the
current task until the next task is executed.

Example:

task1:
  action: core.local
  input:
    cmd: "echo hello"
  publish:
    output: "{{ task('task1').result.stdout }}"
    # this errors out because output isn't available in the contex yet
    hello_world: "{{ _.output + ' world' }}"

Maybe converting the publish statement to an array would make it easier
to execute in sequence?

task1:
  action: core.local
  input:
    cmd: "echo hello"
  publish:
    - output: "{{ task('task1').result.stdout }}"
    # output should now be available in the current context, so the following 
    # should work
    - hello_world: "{{ _.output + ' world' }}"

When condition on a task level

Sometimes it's necessary to skip a task given a condition. Currently we have to
work around this by adding this "skip" condition into every on-success statement
in the workflow that may call our task. This is a maintenance burdem.

Ansible example:

- name: install stackstorm pack
  shell:
    cmd: "st2 pack install {{ st2_pack }}"
  when: st2_pack is defined

Mistral example (current implementation):

input:
  - servicenow_provision_id: null

task_1:
  action: std.noop
  publish:
    last_task: "task_1"
  on-success:
    - task_servicenow_update: "{{ _.servicenow_provision_id }}"
    - task_2

task_2:
  action: std.noop
  publish:
    last_task: "task_2"
  on-success:
    - task_servicenow_update: "{{ _.servicenow_provision_id }}"
    
task_servicenow_update:
  action: encore_servicenow.provision_state_update
  input:
    provision_id: "{{ _.servicenow_provision_id }}"
    current_state: "{{ _.last_task }}"

Mistral example using a when condition on task_servicenow_update task:

input:
  - servicenow_provision_id: null

task_1:
  action: std.noop
  publish:
    last_task: "task_1"
  on-success:
    - task_servicenow_update
    - task_2

task_2:
  action: std.noop
  publish:
    last_task: "task_2"
  on-success:
    - task_servicenow_update
    
task_servicenow_update:
  action: encore_servicenow.provision_state_update
  when: "{{ _.servicenow_provision_id }}"
  input:
    provision_id: "{{ _.servicenow_provision_id }}"
    current_state: "{{ _.last_task }}"

"Main" task / entry-point

There are many occasions on Slack where users of Mistral are confused by its
default execution behavior. People usually forget to tie tasks together with
explicit on-success, on-error, or on-complete statements, causing tasks
to be run in parallel resulting in odd and unpredictable behaviors.

It might be better to explicity define a "main" or "entry-point" in the workflow.

description: My entry-point workflow
input:
  - x
  - y 
  - z
  
# the tasks that should be executed as "roots"
# this can either be a string, or a list so that >1 can be executed in parallel?
entry-point: my_first_task

tasks:
  my_first_task:
    action: std.noop
    on-success:
      - some_other_task
    
  some_other_task:
    action: std.noop

Linear execution by default

On the same lines as the last point about "main tasks" people new to Mistral are
often confused by the non-linearity as the default operating state of the
workflow engine.

Maybe as an alternative to the "main task" idea above, workflows could be executed
linearly and in parallel/DAG-optimized using an option. I'm going to
suggest a workflow parameter called execution_model. The execution_model
with a value of linear means that tasks are executed in order just like an
actionchain or Ansible. Another implementation could be the parallel execution
model that will switch it over to a Mistral-like parallel execution by creating
a DAG and executing as much in parallel as possible.

description: > 
  My linear workflow, all of these are executed in order
  without the need for on-success.
execution_model: linear
  
tasks:
  task_0:
    action: std.noop
    
  task_1:
    action: std.noop

  task_2:
    action: std.noop
description: > 
  Parallel execution model tries to do as much in parallel as possible, requires
  on-success and on-error to construct our DAG.
execution_model: parallel
  
tasks:
  task_0:
    action: std.noop
    on-success:
      - task_1
    
  task_1:
    action: std.noop
    on-success:
      - task_2
      - task_3

  task_2:
    action: std.noop
    
  task_3:
    action: std.noop

Keep: defined inputs and output

We really like having the inputs and outputs sections explicity defined.
Please keep these!

Keep: allowed temporaries in workflow

We currently use (and maybe abuse) the ability of having input values defined
in the workflow that are not present in the parameters of the action itself.
These are mostly used as temporary variables that are local to the workflow

Passing true to decrypt of st2kv function does not work

Taking the examples.orquesta-st2kv workflow as an example, https://github.com/StackStorm/st2/blob/master/contrib/examples/actions/workflows/orquesta-st2kv.yaml. If the expression <% st2kv(ctx().key_name, decrypt=ctx().decrypt) %> is changed to <% st2kv(ctx().key_name, decrypt=true) %>, the value is not decrypted. If the literal true is explicitly set to bool like <% st2kv(ctx().key_name, decrypt => bool('true')) %>, then the value is decrypted.

Workflow join is not triggered on complete for failed task(s)

Workflows that have long running tasks in sub-workflows are not waited to complete before the top layer workflow completes. I tried join: all and join: 2 based on other issues on this repo but got the same result each time. Given the following workflows and action files:

ST2 Version:

# st2 --version
st2 3.1.0, on Python 2.7.5

Master workflow:
Action orquesta-join.yaml:

---
description: A basic workflow that demonstrate branching and join.
enabled: true
runner_type: orquesta
entry_point: workflows/orquesta-join.yaml
name: orquesta-join
pack: encore

Workflow orquesta-join.yaml:

---
version: 1.0

description: A basic workflow that demonstrate branching and join.

tasks:
  task1:
    action: core.noop
    next:
      - when: <% completed() %>
        do: task2, task3

  task2:
    action: encore.orquesta-join-4
    next:
      - when: <% completed() %>
        do: finish

  task3:
    action: encore.orquesta-join-2
    next:
      - when: <% completed() %>
        do: finish

  finish:
    # This did not work with either 2 or all
    # join: all
    join: 2
    action: core.noop
    next:
      - do: noop

Sub workflows:
Action orquesta-join-2.yaml:

---
description: A basic workflow that demonstrate branching and join.
enabled: true
runner_type: orquesta
entry_point: workflows/orquesta-join-2.yaml
name: orquesta-join-2
pack: encore

Workflow orquesta-join-2.yaml:

---
version: 1.0

description: A basic workflow that demonstrate branching and join.

tasks:
  task1:
    action: core.echo message="Fee fi"
    next:
      - when: <% succeeded() %>
        do: task2

  task2:
    action: encore.orquesta-join-3

Action orquesta-join-3.yaml:

---
description: A basic workflow that demonstrate branching and join.
enabled: true
runner_type: orquesta
entry_point: workflows/orquesta-join-3.yaml
name: orquesta-join-3
pack: encore

Workflow orquesta-join-3.yaml:

---
version: 1.0

description: A basic workflow that demonstrate branching and join.

tasks:
  task1:
    action: core.echo message="Fee fi"
    next:
      - when: <% succeeded() %>
        do: task2

  task2:
    action: core.local
    input:
      cmd: "sleep 100"

Action orquesta-join-4.yaml:

---
description: A basic workflow that demonstrate branching and join.
enabled: true
runner_type: orquesta
entry_point: workflows/orquesta-join-4.yaml
name: orquesta-join-4
pack: encore

Workflow orquesta-join-4.yaml:

---
version: 1.0

description: A basic workflow that demonstrate branching and join.

tasks:
  task1:
    delay: 20
    action: core.echo message="Fee fi"
    next:
      - when: <% succeeded() %>
        do: task2

  task2:
    action: encore.orquesta-join-3

Error:
error

st2workflowengine.log

2020-03-09 10:58:07,904 140625357435120 ERROR workflows [-] [5e66592949b3d2060f42bc78] Workflow execution completed with errors. (task_id=u'task2',type=u'error',result={u'output': None, u'errors': []},message=u'Execution failed. See result for details.')
2020-03-09 10:58:08,005 140625358292784 INFO workflows [-] [5e66592749b3d26f3749e3be] Identifying next set (0) of tasks after completion of task "task2", route "0".
2020-03-09 10:58:08,007 140625358292784 INFO workflows [-] [5e66592749b3d26f3749e3be] No tasks identified to execute next.
2020-03-09 10:58:08,108 140625358292784 ERROR workflows [-] [5e66592749b3d26f3749e3be] Workflow execution completed with errors. (task_id=u'task3',type=u'error',result={u'output': None, u'errors': [{u'message': u'Execution failed. See result for details.', u'type': u'error', u'result': {u'output': None, u'errors': []}, u'task_id': u'task2'}]},message=u'Execution failed. See result for details.')
2020-03-09 10:58:08,109 140625358292784 ERROR workflows [-] [5e66592749b3d26f3749e3be] Workflow execution completed with errors. (task_id=u'task2',type=u'error',result={u'output': None, u'errors': [{u'message': u'Execution failed. See result for details.', u'type': u'error', u'result': {u'output': None, u'errors': []}, u'task_id': u'task2'}]},message=u'Execution failed. See result for details.')
2020-03-09 10:58:08,249 140625358292784 ERROR workflows [-] [5e66592749b3d26f3749e3be] Workflow execution completed with errors. (task_id=u'task3',type=u'error',result={u'output': None, u'errors': [{u'message': u'Execution failed. See result for details.', u'type': u'error', u'result': {u'output': None, u'errors': []}, u'task_id': u'task2'}]},message=u'Execution failed. See result for details.')
2020-03-09 10:58:08,250 140625358292784 ERROR workflows [-] [5e66592749b3d26f3749e3be] Workflow execution completed with errors. (task_id=u'task2',type=u'error',result={u'output': None, u'errors': [{u'message': u'Execution failed. See result for details.', u'type': u'error', u'result': {u'output': None, u'errors': []}, u'task_id': u'task2'}]},message=u'Execution failed. See result for details.')

Context variables are being overwritten on join

Given the following workflow where task1, task2, and task3 running in parallel and joining on task99

version: 1.0

description: A basic parallel workflow.

input:
  - x: false
  - y: false
  - z: false

output:
  - data:
      x: <% ctx().x %>
      y: <% ctx().y %>
      z: <% ctx().z %>

tasks:
  init:
    action: core.noop
    next:
      - do: task1, task2, task3

  task1:
    action: core.noop
    next:
      - publish: x=true
        do: task99

  task2:
    action: core.noop
    next:
      - publish: y=true
        do: task99

  task3:
    action: core.noop
    next:
      - publish: z=true
        do: task99

  task99:
    join: all
    action: core.noop

The expected output should be the following.

data:
  x: true
  y: true
  z: true

However, the actual output is this.

data:
  x: false
  y: false
  z: true

When executing task99, orquesta takes the final context from task1, task2, and task3 and merge them together to form the initial context for task99. The final context for task1 has x=true but y and z as false. The final context for task2 has y=true but x and z as false. And so forth for task3. When the context is merged sequentially, the final context for task3 overwrites those from task1 and task2 resulting in x and y being false.

Failure to publish True to a variable using shorthand format

Given the following workflow where there is a shorthand publish x=True, the output of the workflow result in x=false. The camel case True is not accepted in the assignment. If the publish uses all lowercase true, then the output is as expected.

version: 1.0

input:
  - x: false

output:
  - x: <% ctx().x %>

tasks:
  task1:
    action: core.echo message=<% str(ctx().x) %>
    next:
      - publish: x=True

This problem does not exist if publish is done normally in non-shorthand format like the workflow below.

version: 1.0

input:
  - x: false

output:
  - x: <% ctx().x %>

tasks:
  task1:
    action: core.echo message=<% str(ctx().x) %>
    next:
      - publish:
         - x: True

Allow YAQL/Jinja expression in join

Currently, join can either be all or an integer that represents number of incoming edges with examples being join: all or join: 3. There are cases where the number of incoming edges need to be determined dynamically and join does not currently accept YAQL or Jinja expression.

YAQL methods can't handle an array of dict value at 'items' of 'with' parameter

An execution of following workflow would fail with this error message when user specifes an array of dict value.

'YaqlEvaluationException: Unable to evaluate expression ''<% ctx(array_input).distinct() %>''. TypeError: unhashable type: ''BaseDict'

Metadata file (test_input.yaml)

---
name: test_input
description: Test action to check input parameter would be handled correctly
runner_type: orquesta
entry_point: workflows/test_input.yaml
enabled: true
parameters:
  array_input:
    type: array

Orquesta file (workflows/test_input.yam)

version: 1.0

input:
  - array_input

description: Intermediate workflow to check performance

tasks:
  task:
    action: core.echo
    with:
      items: <% ctx(array_input).distinct() %>
    input:
      message: <% item() %>

Execution result

ss_for_reporting_issue_of_orquesta_20190825

Environment of st2

スクリーンショット 2019-08-26 10 33 39

Note

This problem was happend at calling YAQL method 'distinct' to evaluate array of dict value which was specified in 'items' parameter.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.