nick-fields / retry Goto Github PK

View Code? Open in Web Editor NEW

404.0 404.0 77.0 877 KB

Retries a GitHub Action step on failure or timeout

License: MIT License

JavaScript 13.23% Shell 2.66% TypeScript 81.42% Makefile 2.69%

retry's People

Contributors

Stargazers

Watchers

retry's Issues

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

Warning: Unexpected input(s) 'retry_on_exit_code'

Describe the bug
When using "retry_on_exit_code", it shows this warning:

I have tried setting a timeout_minutes, even though I didn't want to use one, and the error didn't happen but the warning persisted:

Expected behavior
I did not expect that warning to show up, and I did not want to set a timeout to use retry_on_exit_code.

Respect default shell choice

Describe the bug
I use

defaults:
  run:
    shell: bash

at the top of my shell scripts to ensure all steps of all jobs use bash. One such step is using this retry actions yet the shell used to run commands inside this action does not automatically use Bash.

Expected behavior
The default shell is detected, if set, and is used.

Logs
Here is a failing CI pipeline caused by removing shell: bash.

Unclear what the error message is upon failure

We are trying out the retry action and we have some output like this:

events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: spawn node ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:264:19)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
Emitted 'error' event on ChildProcess instance at:
    at Process.ChildProcess._handle.onexit (internal/child_process.js:270:12)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn node',
  path: 'node',
  spawnargs: [
    '/home/concord/actions-runner/_work/_actions/nick-invision/retry/v1.0.0/dist/exec.js',
    './mvnw test -B -Dair.check.skip-all -pl plugin/starburst-snowflake -Pjdbc-integration-tests ${SNOWFLAKE_CONFIG}'
  ]
}

Does this mean the command we are trying to run is not present?

after timeout, the whole run fails without another attempt

Describe the bug
after timeout, the whole run fails without another attempt

Expected behavior
as far as I can tell from the documentation, the timeout applies to each retry, so if it times-out on the first time, it will retry again etc

Screenshots

      - name: run bats tests
        uses: nick-fields/retry@v2
        with:
          timeout_minutes: 2
          max_attempts: 3
          command: |
            docker image ls -a
            for filename in tests/*.bats; do sudo bats --tap "$filename" || exit 1; done

/home/runner/work/_actions/nick-fields/retry/v2/dist/index.js:3199
            throw err;
            ^

Error: kill EPERM
    at process.kill (node:internal/process/per_thread:220:13)
    at killPid (/home/runner/work/_actions/nick-fields/retry/v2/dist/index.js:3209:17)
    at /home/runner/work/_actions/nick-fields/retry/v2/dist/index.js:3186:21
    at Array.forEach (<anonymous>)
    at /home/runner/work/_actions/nick-fields/retry/v2/dist/index.js:3184:23
    at Array.forEach (<anonymous>)
    at killAll (/home/runner/work/_actions/nick-fields/retry/v2/dist/index.js:3183:27)
    at /home/runner/work/_actions/nick-fields/retry/v2/dist/index.js:3174:13
    at ChildProcess.onClose (/home/runner/work/_actions/nick-fields/retry/v2/dist/index.js:3230:17)
    at ChildProcess.emit (node:events:527:28) {
  errno: -1,
  code: 'EPERM',
  syscall: 'kill'
}

Logs
run

raw log with debug

v2 swallows command output?

After upgrading to v2 of this action we don't see any output from the command ran by this action:

allow failure option

I have a command that I know it fails on a specific operating system, but I want to run it as a reminder to fix the issue. However, I don't want the step to fail the whole job. Is there a way to do that using retry?

Option to suppress warning output on retry

If it's "normal" for my workflow step to have to run multiple times, I don't like having the warning output on my workflow.

I would prefer if I could configure this action so I could lower that output to info level.

Feature Request: retry another github action

Is it possible to retry another github action? For example, if my workflow uses the following to install a JDK:

      - name: Retry Set up JDK if failed
        uses: actions/setup-java@v4
        with:
          distribution: 'zulu'
          java-version: 21

Is it possible to use this action to retry it?

My current workaround is this:

      - name: Set up JDK
        id: setup_jdk
        uses: actions/setup-java@v4
        with:
          distribution: 'zulu'
          java-version: 21
      - name: Sleep a bit
        run: sleep 5
      - name: Retry Set up JDK if failed
        if: steps.setup_jdk.outcome == 'failure'
        uses: actions/setup-java@v4
        with:
          distribution: 'zulu'
          java-version: 21

which is not elegant.

use

Describe the bug
A clear and concise description of what the bug is, including the snippet from your workflow yaml showing your configuration and command being executed.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Logs
Enable debug logging then attach the raw logs (specifically the raw output of this action).

Can you continue to retry the failed step when it fails

Failure on error running a awscli command

Probably completely related to Issue #4 but didn't want to jump on that "me too" since not sure if that ticket is just stale but also not sure if the actual issue is the command being run. The ec2 wait image-available is a waiter command so not sure if what it does is throwing this off? This command throws the events.js:187 immediately.

Error output

Run nick-invision/retry@v1
  with:
    timeout_minutes: 1
    max_attempts: 25
    command: aws ec2 wait image-available --filters Name=name,Values=import-ami --profile automation
    retry_wait_seconds: 10
    polling_interval_seconds: 1
  env:
    AWS_PROFILE: automation
    AWS_REGION: us-east-1

events.js:187
      throw er; // Unhandled 'error' event
      ^

Error: spawn node ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:264:19)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21)
Emitted 'error' event on ChildProcess instance at:
    at Process.ChildProcess._handle.onexit (internal/child_process.js:270:12)
    at onErrorNT (internal/child_process.js:456:16)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn node',
  path: 'node',
  spawnargs: [
    '/data/actions-runner/_work/_actions/nick-invision/retry/v1/dist/exec.js',
    'aws ec2 wait image-available --filters Name=name,Values=import-ami --profile automation'
  ]
}

The command
aws ec2 wait image-available --filters Name=name,Values=import-ami --profile automation
works just fine with running within a normal run command but I need to retry this as AWS is completely random in how long it takes so I need the waiter to be run a few times before giving up.

Workflow call:

# Commands within the retry action cannot be wrapped!
- name: Wait for New AMI Image to become available
uses: nick-invision/retry@v1
with:
  timeout_minutes: 1
  max_attempts: 25
  command: aws ec2 wait image-available --filters Name=name,Values=${{ env.IMPORT_IMAGE_ID }} --profile ${{ env.AWS_PROFILE }}

Support minimum_time for pipelines that return too quickly

We have a pipeline that takes 1 hour to run.
Occasionally see the pipeline silently fails after a few seconds/minutes.

Unfortunately the failure doesn't raise an exception or an invalid exit code, so the pipeline continues on.

We would like to have a flag similar to the timeout_minutes, however rather than setting a ceiling for the maximum time the task should run, we want a minimum time that the task should have taken

      - name: ${{ matrix.image }}
        uses: nick-fields/retry@v2
        with:
          min_time: 60 #<----------------------------- desired feature
          timeout_minutes: 180 # 3 hours
          polling_interval_seconds: 10
          max_attempts: 2
          retry_on: error
          command: "${{ env.PKR_VAR_root_file_path }}/actions/build-images.sh"
          ```

Retry the whole action or job

Hey,

I was wondering how can i retry the whole Action or the job on failure?

Let's assume i have this workflow:

name: Deploy

on: workflow_dispatch

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: akhileshns/[email protected]
        with:
          heroku_api_key: ${{secrets.HEROKU_API_KEY}}
          heroku_app_name: ${{secrets.HEROKU_APP_NAME}}
          heroku_email: ${{secrets.HEROKU_EMAIL}}

How can i retry if akhileshns/heroku-deploy failed because of a timeout? is it even possible?

Getting started with uses step

Describe the bug
I have this Setup R step which fails frequently in the CI
steps:
- uses: actions/checkout@v2

- name: Set up R
  uses: r-lib/actions/[email protected]

It's unclear to me how do I use this package and retry this step up to 3 times on failure.

Expected behavior
Expected to see some docs on it.

Not retrying even though step fails

I setup retry because of a flaky test which has never been a problem of timing out. It usually last 50~60 minutes and succeed or fail.

Here my config:

      - name: Test
        uses: nick-fields/retry@v2
        with:
          max_attempts: 5
          timeout_minutes: 90
          shell: bash
          command: |
            ./gradlew -Pswift=false build --stacktrace --warning-mode all

Here is the job: https://github.com/square/wire/actions/runs/6036930069/job/16380190814
I cannot find log saying that anything has been retried and the step failed on the first attempt.

Why isn't it retrying? Is my config wrong?

Upgrade to Node 20

I'm getting the following warning in annotations:

Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: nick-fields/retry@v2. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.

Edit: I have a PR which upgrades the action / repo to support Node 20 here #126

Make timeout optional

I have some tests which are a bit flaky but where I don't particularly care if they take longer. So I'd like to retry them on failure, but not time them out. retry_on: error doesn't do this: it will still fail it if it times out, it just won't retry it. It would be nice to just be able to omit the timeout.

`set-output` and `save-state` are deprecated

Describe the bug
Since runner version 2.298.2 set-output and save-state are deprecated: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

Expected behavior
No deprecation message

Screenshots
n/a

Logs
n/a

Should commands fail-fast like default GitHub action behavior?

Describe the bug
Depending on your perspective, this may not be a bug. However, if it's not then I think some documentation to make the expected behavior clearer would help users.

Default GitHub behavior for steps with a run command is to fail-fast when supported. This behavior exists for bash/sh/powershell scripts.

I discovered the retry action while attempting to troubleshoot a step of my GitHub workflow that's susceptible to an external race condition with the Azure CLI where retrying a specific step a few times is a simple way to deal with that race condition. After reading over the documentation, it seemed like close to a drop-in replacement for my existing run step. However, the action would never retry when the script encountered the expected failure because the last line of my command was a simple command that always succeeded.

az account set --subscription <some-sub-id>
az command_that_sometimes_fails
az account set --subscription <original-sub-id>

It's not hard to workaround this by adding set -e at the start of my command. But I was not expecting to have to do this and went through several rounds of adding debug calls and enabling debug logging until I realized it wasn't retrying because it was executing the final line even when the previous line failed.

Expected behavior
IMO, it'd be great if the retry action supported the default GitHub run behavior (even if it's optional / opt-in) where the command fails-fast. However, if you disagree, then it's probably worth documenting this behavior so that people don't assume it behaves the same as a run step.

Screenshots
n/a

Logs
n/a

Default shell, and option to select a different shell

It would be very useful, in order to convert easily a "normal" command if a shell parameter was supported, to change the shell executing the command.
See: https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#using-a-specific-shell

Also, from what I understand, on windows runners, the shell used is cmd, since the only way I could find to set a comment was using rem.
The one used by default from normal actions is bash, I think it would be great if the action default was the same.

Emit when a step is on its "last attempt"

We are using this action in our project and in the command script, we send a notification to slack on failure.

We want to differentiate an "intermediate step fail" vs "the last attempt failed".

I made an attempt to solve this #8 but I realized that you can't use setOutput/exportVariable in the same step (it is only available in the next step).

So, I figured a way out by using the filesystem and a variable for changing the filename (to use retry in multiple steps).

I don't think that the current PR is in a mergeable state but I wanted to take this opportunity to raise this issue with you.

Have you thought of this problem? Do you think there is a better way to solve this?

Happy to write code for it if we decide on an approach.

Skip waiting for retry_wait_seconds after final attempt failed

Describe the bug
The action waits for [retry_wait_seconds] seconds after the final attempt failed.

Expected behavior
The action should fail immediately after the final attempt fails regardless of [retry_wait_seconds] since there's no more retries.

Exponential backoff support?

I have a flaky step that sometime can be long, so I need some mechanism to change interval time flexibly.

Exponential backoff is the best solution, I guess. Do you have any plan to support it?

on_retry_command not respecting shell choice

Describe the bug
The on_retry_command input, when run, is not using the default or chosen shell from the action inputs. This is causing certain commands to fail

- name: Deploy
  uses: nick-fields/[email protected]
  with:
    retry_wait_seconds: 1
    max_attempts: 3
    timeout_minutes: 7
    command: |
      exit 1
    on_retry_command: |
      if [[ "foo" == "foo" ]]; then
        echo "result is true"
      fi

Error: /bin/sh: 1: [[: not found

Expected behavior
It should return result is true, and use bash as the shell, not sh

Screenshots
If applicable, add screenshots to help explain your problem.

Logs
Enable debug logging then attach the raw logs (specifically the raw output of this action).

Debug logs: raw-debug-logs.txt

Feature request. Retry only if branch name matches

We have a couple of workflows that run on branch main we'd like to only retry if the branch name matches that (not user branches)

Only retry if failed before n seconds

If possible may we have an option that only retries if the failure occured before a configured length of time

i.e. if the failure occurs before 60 seconds, retry

Make timeout and command optional to allow for cases where you want a step to be retryable to also uses another action in the same step.

Describe the bug
A clear and concise description of what the bug is, including the snippet from your workflow yaml showing your configuration and command being executed.

The timeout and command inputs are not optional which conflicts with workflows who want to use this action with another action to force it to rerun until it passes (on my end it sometimes passes on it's own if it does not the first time when it runs dotnet build -c Release due to a bug in dotnet/arcade after 3~7 times)

Expected behavior
A clear and concise description of what you expected to happen.

For the timeout and command inputs to be optional (no time out).

Screenshots
If applicable, add screenshots to help explain your problem.

N/A

Logs
Enable debug logging then attach the raw logs (specifically the raw output of this action).

N/A Filing bug directly from marketplace as I did not like that these specific inputs are required (or default them to 0 where 0 means that there is no time limit).

Allow continue-on-error for use in composite actions

From @czifro on a previously closed issue:

I think there's a valid case now to support this in action. With the rollout of improved composite actions: https://github.blog/changelog/2021-08-25-github-actions-reduce-duplication-with-action-composition/ , there is no way to specify continue-on-error. Furthermore, composite actions don't support if: failure(), so it is not possible to have failure tracks in composite actions.

Originally posted by @czifro in #11 (comment)

Allow to specify a working-directory

Describe the bug
My workflow expects the script to be executed from a specific directory, so I use something like this:

- name: Do foo
  working-directory: some-dir
  run: ./foo

When porting this to retry, I cannot set the working-directory for the step, as it only works with run.

Expected behavior
Retry should allow to specify a working-directory

Screenshots
If applicable, add screenshots to help explain your problem.

GITHUB_TOKEN permissions used by this action

At https://github.com/step-security/secure-workflows we are building a knowledge-base (KB) of GITHUB_TOKEN permissions needed by different GitHub Actions. When developers try to set minimum token permissions for their workflows, they can use this knowledge-base instead of trying to research permissions needed by each GitHub Action they use.

Below you can see the KB of your GITHUB Action.

name: Retry Step # nick-invision/retry
# GITHUB_TOKEN not used

If you think this information is not accurate, or if in the future your GitHub Action starts using a different set of permissions, please create an issue at https://github.com/step-security/secure-workflows/issues to let us know.

This issue is automatically created by our analysis bot, feel free to close after reading :)

References:

GitHub asks users to define workflow permissions, see https://github.blog/changelog/2021-04-20-github-actions-control-permissions-for-github_token/ and https://docs.github.com/en/actions/security-guides/automatic-token-authentication#modifying-the-permissions-for-the-github_token for securing GitHub workflows against supply-chain attacks.

Setting minimum token permissions is also checked for by Open Source Security Foundation (OpenSSF) Scorecards. Scorecards recommend using https://github.com/step-security/secure-workflows so developers can fix this issue in an easier manner.

Why does timeout_seconds have to be as long as retry_wait_seconds ?

Firstly, thanks for the useful action. I hope it helps motivate GitHub to add this as built-in functionality soon enough.

For my use-case, I wanted to ping an http endpoint with curl, but it was failing because the app I was pinging was not up yet.

So I wanted to add this action to retry if the ping failed so that it would succeed once the app came up.

Initially I set it up to timeout after 2s (since the request should be fast) and try again after 10s
It failed because timeout_seconds 2s less than retry_wait_seconds 10s

but, I think this should be allowed.

Use outputs of `command` as step outputs

Describe the bug

It would be great it this worked

- id: step_1
    uses: nick-fields/retry@v3
    with:
      command: echo "foo=bar" >> $GITHUB_OUTPUT
- run: echo ${{ steps.step_1.outputs.foo }}

Expected behavior
See above

Screenshots
N/A

Logs
N/A

Make the default name "Run with retries ${command}"?

Describe the bug
The standard Github Actions steps[*] .run step generates default name as `Run ${command}`, which is very convenient when reading logs.

The nick-fields/retry by default is reflected in logs as "Run nick-fields/[email protected]", that is of course less convenient.
(The user can explicitly specify the name but that essentially requires to duplicate the command text in the step specification - once in name and once in command)

Expected behavior
I'd suggest to use `Run with retries ${command}` by default.

Screenshots

Shell arguments not passed though

Describe the bug

The action does not allow arguments to be passed into the shell parameter. These are apparently required to run in a conda environment (e.g. https://github.com/pykale/pykale/blob/main/.github/workflows/test.yml).

- name: Run tests
  id: run_tests
  uses: nick-invision/retry@v2
  with:
    timeout_minutes: 180 # Very long
    max_attempts: 3
    retry_wait_seconds: 10800 # Wait 180 minutes before trying again (fail likely because server down)
    command: |
      pytest --nbmake --cov=kale
    shell: bash -l {0}

##[warning]Attempt 1 failed. Reason: Shell bash -l {0} not supported.  See https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#using-a-specific-shell for supported shells

Expected behavior
This may be a fairly straightforward job around this switch statement https://github.com/nick-invision/retry/blob/7f8f3d9f0f62fe5925341be21c2e8314fd4f7c7c/src/index.ts#L84 but I'm not a typescript coder. If Windows didn't need the appended ".exe", perhaps passing SHELL to executable would solve this? I guess it depends a bit on downstream error handling if someone passes a genuinely invalid shell.

Logs
Some logging here https://github.com/pykale/pykale/runs/5142791229?check_suite_focus=true I can provide more if needed.

Marks failing commands as having succeeded if they have too much output (SIGPIPE)

Describe the bug
If a command has too much output, then even though the command might fail, the step gets marked as passing.

I measured "too much" as ≥1026 KiB, but I wouldn't be surprised if it's timing related and varies at any value much over 1024.

I've created a reproducer over at https://github.com/LukeShu/gha-test where each step always() runs and dumps an increasing amount of output to stdout and then exits with a failure. As seen in the screenshot below, everything <=1MiB correctly fails, but then everything >=1⅛MiB starts passing.

This can be replicated locally, as well:

(All you need to know about my gha-test reproducer is that make -C ../gha-test bytes-1152 spits out 1152KiB of <=80-character lines on stdout, and then exits with code 1.)

$ INPUT_CONTINUE_ON_ERROR=false INPUT_MAX_ATTEMPTS=1 INPUT_TIMEOUT_MINUTES=5 INPUT_COMMAND='make -C ../gha-test bytes-1152' node ./dist/index.js
…
9:  648 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a:  729 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
b:  810 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
c:  891 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
d:  972 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
::debug::Code: null
::debug::Signal: SIGPIPE
Command completed after 1 attempt(s).

::set-output name=exit_code::0

I originally encountered this with v2.4.0, but the local-reproduction above is with the latest version (v2.7.0).

Expected behavior
I expected that if my tests fail, then my CI will fail.

Screenshots

Logs
Enable debug logging then attach the raw logs (specifically the raw output of this action).

Fails by timeout instead of retries because stdin of the child shell process is not closed (in contrast to standard Github Actions steps[*].run)

Describe the bug
Unlike the standard run step of the Github Actions, the nick-fields/retry step keeps open the standard input stream of the shell started to execute the command. In result, if a command tries to read from stdin, it hangs.

The standard Githup Actions runs the shell with closed stdin by default, so the commands trying to read from stdin exit immediately.

This is important, because when a crash happens (that I want to work-around with retries) my language implementation (Clozure Common Lisp) enters a "kernel debugger" which waits for console input.

So imagine a user experience: user copies the same command from standard github actions run hoping the nick-fields/retry will run it the same way, but gets a different behavior - instead of retry the program hangs for long time and then fails by timeout.

I observed this difference in behavior, and even found the code where the standarg actions closes stdin of the child shell process:
https://github.com/actions/runner/blob/cba19c4d7e1cf8071a4b4f7e24de98eb3d0e6d0f/src/Runner.Sdk/ProcessInvoker.cs#L317

Expected behavior
Behave the same as the standard github actions - close stdin of the shell.

Screenshots

Timeout when using the nick-fields/retry (note the 2 min timeout):

Immediate termination with the standard run step (note the 2s duration at the top right):

Times out on successful command

Describe the bug
We execute a multi line command whose final step is a terraform apply. We have seen behavior where the terraform apply will be successful but it will hang and then timeout. On the second retry no modifications need to be made by terraform and the terraform apply is instantly successful. We run a 1hr timeout. This leads to us seeing step times of approximately 1hr 5min. An example can be seen here.

Expected behavior
The first successful attempts actually exits and does not run a second time.

Screenshots

Logs
Relevant snippet I believe. End of first job where timeout happens

[0].subsegments[0].name=S3, [0].subsegments[0].id=26e9b8ff08beec05, [0].subsegments[0].start_time=1663030769.297894, [0].subsegments[0].end_time=1663030769.433909, [0].subsegments[0].namespace=aws, [0].subsegments[0].http.request.url=https://s3.us-west-2.amazonaws.com/, [0].subsegments[0].http.request.method=GET, [0].subsegments[0].http.request.user_agent=aws-sdk-java/2.14.26 Linux/5.10.130-118.517.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/11.0.13+8-LTS Java/11.0.13 vendor/Amazon.com_Inc. io/sync http/Apache, [0].subsegments[0].http.response.status=200, [0].subsegments[0].http.response.content_length=0, [0].subsegments[0].aws.account_id=***, [0].subsegments[0].aws.ec2.availability_zone=us-west-2b, [0].subsegments[0].aws.ec2.instance_id=i-052772479814f11e3, [0].subsegments[0].aws.ec2.instance_size=t3.medium, [0].subsegments[0].aws.ec2.ami_id=ami-0c2ab3b8efb09f272, [0].subsegments[0].aws.xray.auto_instrumentation=true, [0].subsegments[0].aws.xray.sdk_version=1.4.1, [0].subsegments[0].aws.xray.sdk=opentelemetry for java, [0].subsegments[0].aws.operation=ListBuckets, [0].subsegments[0].aws.request_id=QGSXWT0YZR85BQD5, [0].subsegments[0].metadata.default[\"net.transport\"]=ip_tcp, [0].subsegments[0].metadata.default[\"http.flavor\"]=1.1, [0].subsegments[0].metadata.default[\"aws.service\"]=S3, [0].subsegments[0].metadata.default[\"aws.agent\"]=java-aws-sdk, [0].subsegments[0].metadata.default[\"thread.name\"]=qtp1768416789-30, [0].subsegments[0].metadata.default[\"thread.id\"]=30, [0].http.request.url=http://18.237.1.202:8080/aws-sdk-call, [0].http.request.method=GET, [0].http.request.user_agent=okhttp/4.9.3, [0].http.request.client_ip=20.124.198.204, [0].http.request.x_forwarded_for=true, [0].http.response.status=200, [0].http.response.content_length=0, [0].aws.account_id=***, [0].aws.ec2.availability_zone=us-west-2b, [0].aws.ec2.instance_id=i-052772479814f11e3, [0].aws.ec2.instance_size=t3.medium, [0].aws.ec2.ami_id=ami-0c2ab3b8efb09f272, [0].aws.xray.auto_instrumentation=true, [0].aws.xray.sdk_version=1.4.1, [0].aws.xray.sdk=opentelemetry for java, [0].metadata.default[\"http.flavor\"]=1.1, [0].metadata.default[\"otel.resource.host.arch\"]=amd64, [0].metadata.default[\"otel.resource.host.name\"]=99a5ab526dc5, [0].metadata.default[\"thread.name\"]=qtp1768416789-30, [0].metadata.default[\"otel.resource.service.name\"]=aws-otel-integ-test, [0].metadata.default[\"otel.resource.telemetry.auto.version\"]=1.4.1-aws, [0].metadata.default[\"otel.resource.process.pid\"]=1, [0].metadata.default[\"otel.resource.os.description\"]=Linux 5.10.130-118.517.amzn2.x86_64, [0].metadata.default[\"otel.resource.os.type\"]=linux, [0].metadata.default[\"otel.resource.cloud.region\"]=us-west-2, [0].metadata.default[\"otel.resource.host.type\"]=t3.medium, [0].metadata.default[\"thread.id\"]=30, [0].metadata.default[\"otel.resource.telemetry.sdk.name\"]=opentelemetry, [0].metadata.default[\"otel.resource.cloud.availability_zone\"]=us-west-2b, [0].metadata.default[\"otel.resource.host.image.id\"]=ami-0c2ab3b8efb09f272, [0].metadata.default[\"otel.resource.process.runtime.description\"]=Amazon.com Inc. OpenJDK 64-Bit Server VM 11.0.13+8-LTS, [0].metadata.default[\"otel.resource.process.runtime.version\"]=11.0.13+8-LTS, [0].metadata.default[\"otel.resource.host.id\"]=i-052772479814f11e3, [0].metadata.default[\"otel.resource.process.command_line\"]=/usr/lib/jvm/java-11-amazon-corretto:bin:java -javaagent:/app/aws-opentelemetry-agent.jar -Dotel.imr.export.interval=1000, [0].metadata.default[\"otel.resource.service.namespace\"]=aws-otel, [0].metadata.default[\"otel.resource.cloud.account.id\"]=***, [0].metadata.default[\"otel.resource.process.executable.path\"]=/usr/lib/jvm/java-11-amazon-corretto:bin:java, [0].metadata.default[\"otel.resource.telemetry.sdk.version\"]=1.4.1, [0].metadata.default[\"otel.resource.process.runtime.name\"]=OpenJDK Runtime Environment, [0].metadata.default[\"otel.resource.telemetry.sdk.language\"]=java, [0].metadata.default[\"otel.resource.cloud.provider\"]=aws, [1].name=S3, [1].id=145b65ee2d26ccb3, [1].parent_id=26e9b8ff08beec05, [1].start_time=1663030769.297894, [1].origin=AWS::S3, [1].trace_id=1-631fd5f1-f6e22f0b485b4b90971c0b96, [1].end_time=1663030769.433909, [1].inferred=true, [1].http.request.url=https://s3.us-west-2.amazonaws.com/, [1].http.request.method=GET, [1].http.request.user_agent=aws-sdk-java/2.14.26 Linux/5.10.130-118.517.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/11.0.13+8-LTS Java/11.0.13 vendor/Amazon.com_Inc. io/sync http/Apache, [1].http.response.status=200, [1].http.response.content_length=0, [1].aws.account_id=***, [1].aws.ec2.availability_zone=us-west-2b, [1].aws.ec2.instance_id=i-052772479814f11e3, [1].aws.ec2.instance_size=t3.medium, [1].aws.ec2.ami_id=ami-0c2ab3b8efb09f272, [1].aws.xray.auto_instrumentation=true, [1].aws.xray.sdk_version=1.4.1, [1].aws.xray.sdk=opentelemetry for java, [1].aws.operation=ListBuckets, [1].aws.request_id=QGSXWT0YZR85BQD5}
2022-09-13T00:59:49.9429542Z �[0m�[0mmodule.ec2_setup.module.validator[0].null_resource.validator (local-exec): �[36mvalidator_1  |�[0m 00:59:49.928 [main] INFO  com.amazon.aoc.validators.TraceValidator - validation is passed for path /aws-sdk-call
2022-09-13T00:59:50.3206283Z �[0m�[0mmodule.ec2_setup.module.validator[0].null_resource.validator (local-exec): �[36mvalidator_1  |�[0m 00:59:50.318 [main] INFO  com.amazon.aoc.App - Validation has completed in 32 minutes.
2022-09-13T00:59:50.4545544Z �[0m�[0mmodule.ec2_setup.module.validator[0].null_resource.validator (local-exec): �[36mcanary_validator_1 exited with code 0
2022-09-13T00:59:50.4681435Z �[0m�[0mmodule.ec2_setup.module.validator[0].null_resource.validator (local-exec): �[0mAborting on container exit...
2022-09-13T00:59:50.5217974Z �[0m�[1mmodule.ec2_setup.module.validator[0].null_resource.validator: Creation complete after 34m8s [id=298226890415546555]�[0m�[0m
2022-09-13T00:59:50.6383254Z �[33m
2022-09-13T00:59:50.6395137Z �[1m�[33mWarning: �[0m�[0m�[1mDeprecated Resource�[0m
2022-09-13T00:59:50.6395371Z 
2022-09-13T00:59:50.6395798Z �[0m  on ../basic_components/main.tf line 39, in data "aws_subnet_ids" "aoc_private_subnet_ids":
2022-09-13T00:59:50.6396281Z   39: data "aws_subnet_ids" "aoc_private_subnet_ids" �[4m{�[0m
2022-09-13T00:59:50.6396583Z �[0m
2022-09-13T00:59:50.6396883Z The aws_subnet_ids data source has been deprecated and will be removed in a
2022-09-13T00:59:50.6397260Z future version. Use the aws_subnets data source instead.
2022-09-13T00:59:50.6397455Z 
2022-09-13T00:59:50.6397582Z (and 3 more similar warnings elsewhere)
2022-09-13T00:59:50.6397858Z �[0m�[0m
2022-09-13T00:59:50.6412468Z �[33m
2022-09-13T00:59:50.6412997Z �[1m�[33mWarning: �[0m�[0m�[1mArgument is deprecated�[0m
2022-09-13T00:59:50.6413199Z 
2022-09-13T00:59:50.6413488Z �[0m  on ../ec2/sshkey.tf line 34, in data "aws_s3_bucket_object" "ssh_private_key":
2022-09-13T00:59:50.6413895Z   34:   bucket = �[4mvar.sshkey_s3_bucket�[0m
2022-09-13T00:59:50.6414176Z �[0m
2022-09-13T00:59:50.6414416Z Use the aws_s3_object data source instead
2022-09-13T00:59:50.6414691Z �[0m�[0m
2022-09-13T00:59:50.6414926Z �[0m�[1m�[32m
2022-09-13T00:59:50.6415302Z Apply complete! Resources: 15 added, 0 changed, 0 destroyed.�[0m
2022-09-13T00:59:50.6415616Z �[0m�[1m�[32m
2022-09-13T00:59:50.6416013Z Outputs:
2022-09-13T00:59:50.6416147Z 
2022-09-13T00:59:50.6416332Z testing_id = "532c503bffd9ffd4"�[0m
2022-09-13T00:59:50.6927828Z 
2022-09-13T01:07:15.5801426Z ##[warning]Attempt 1 failed. Reason: Timeout of 3600000ms hit
2022-09-13T01:07:15.5809036Z 
2022-09-13T01:07:15.6374158Z [command]/home/runner/work/_temp/6abe0f53-d0d0-4ce6-9493-2706cb8f333e/terraform-bin init
2022-09-13T01:07:15.7110649Z �[0m�[1mInitializing modules...�[0m
2022-09-13T01:07:15.7477034Z 
2022-09-13T01:07:15.7488460Z �[0m�[1mInitializing the backend...�[0m
2022-09-13T01:07:16.8428328Z 
2022-09-13T01:07:16.8429254Z �[0m�[1mInitializing provider plugins...�[0m
2022-09-13T01:07:16.8430009Z - Reusing previous version of hashicorp/local from the dependency lock file
2022-09-13T01:07:16.8973205Z - Reusing previous version of hashicorp/aws from the dependency lock file
2022-09-13T01:07:16.9106414Z - Reusing previous version of hashicorp/template from the dependency lock file
2022-09-13T01:07:16.9207502Z - Reusing previous version of hashicorp/random from the dependency lock file
2022-09-13T01:07:16.9326543Z - Reusing previous version of hashicorp/null from the dependency lock file
2022-09-13T01:07:16.9528895Z - Reusing previous version of hashicorp/tls from the dependency lock file
2022-09-13T01:07:17.7957738Z - Using previously-installed hashicorp/aws v4.30.0
2022-09-13T01:07:17.8589242Z - Using previously-installed hashicorp/template v2.2.0
2022-09-13T01:07:17.8975221Z - Using previously-installed hashicorp/random v3.4.3
2022-09-13T01:07:17.9415466Z - Using previously-installed hashicorp/null v3.1.1
2022-09-13T01:07:17.9824733Z - Using previously-installed hashicorp/tls v4.0.2
2022-09-13T01:07:18.0272891Z - Using previously-installed hashicorp/local v2.2.3
2022-09-13T01:07:18.0273152Z 
2022-09-13T01:07:18.0273421Z �[0m�[1m�[32mTerraform has been successfully initialized!�[0m�[32m�[0m
2022-09-13T01:07:18.0273767Z �[0m�[32m
2022-09-13T01:07:18.0274079Z You may now begin working with Terraform. Try running "terraform plan" to see
2022-09-13T01:07:18.0274506Z any changes that are required for your infrastructure. All Terraform commands
2022-09-13T01:07:18.0274820Z should now work.
2022-09-13T01:07:18.0274963Z 
2022-09-13T01:07:18.0275147Z If you ever set or change modules or backend configuration for Terraform,
2022-09-13T01:07:18.0275550Z rerun this command to reinitialize your working directory. If you forget, other
2022-09-13T01:07:18.0276023Z commands will detect it and remind you to do so if necessary.�[0m
2022-09-13T01:07:18.0325874Z 
2022-09-13T01:07:18.0328890Z 
2022-09-13T01:07:18.0329672Z 
2022-09-13T01:07:18.0918281Z [command]/home/runner/work/_temp/6abe0f53-d0d0-4ce6-9493-2706cb8f333e/terraform-bin apply -auto-approve -lock=false -var-file=../testcases/otlp_trace/parameters.tfvars -var=aoc_version=latest -var=testcase=../testcases/otlp_trace -var=testing_ami=canary_windows
2022-09-13T01:07:23.6788843Z �[0m�[1mmodule.ec2_setup.module.common.random_id.testing_id: Refreshing state... [id=jNBPp7FhlMk]�[0m
2022-09-13T01:07:23.6850682Z �[0m�[1mmodule.basic_components.module.common.random_id.testing_id: Refreshing state... [id=Lsc_xoz1Ijg]�[0m
2022-09-13T01:07:23.7103657Z �[0m�[1mmodule.ec2_setup.module.basic_components.module.common.random_id.testing_id: Refreshing state... [id=LxHSiGmnPsY]�[0m
2022-09-13T01:07:23.7215937Z �[0m�[1mmodule.ec2_setup.tls_private_key.ssh_key[0]: Refreshing state... [id=e8b4a430a8704ceb4689ac11f2d91534444d430b]�[0m
2022-09-13T01:07:23.7376507Z �[0m�[1mmodule.common.random_id.testing_id: Refreshing state... [id=UyxQO__Z_9Q]�[0m
2022-09-13T01:07:24.8233634Z �[0m�[1mmodule.ec2_setup.aws_key_pair.aws_ssh_key[0]: Refreshing state... [id=testing-8cd04fa7b16194c9]�[0m
2022-09-13T01:07:25.7453492Z �[0m�[1mmodule.ec2_setup.aws_instance.sidecar: Refreshing state... [id=i-052772479814f11e3]�[0m
2022-09-13T01:07:25.7481191Z �[0m�[1mmodule.ec2_setup.aws_instance.aoc: Refreshing state... [id=i-012ce63c9a70b7f67]�[0m
2022-09-13T01:07:27.4677996Z �[0m�[1mmodule.ec2_setup.null_resource.check_patch[0]: Refreshing state... [id=7558068877623799063]�[0m
2022-09-13T01:07:27.4755791Z �[0m�[1mmodule.ec2_setup.null_resource.setup_mocked_server_cert_for_windows[0]: Refreshing state... [id=4777493023243761673]�[0m
2022-09-13T01:07:27.4756822Z �[0m�[1mmodule.ec2_setup.null_resource.setup_sample_app_and_mock_server[0]: Refreshing state... [id=5338849595985022880]�[0m
2022-09-13T01:07:27.4757834Z �[0m�[1mmodule.ec2_setup.null_resource.download_collector_from_s3[0]: Refreshing state... [id=2605438380643408551]�[0m
2022-09-13T01:07:27.4824823Z �[0m�[1mmodule.ec2_setup.null_resource.start_collector[0]: Refreshing state... [id=4748019381639749946]�[0m
2022-09-13T01:07:27.5077202Z �[0m�[1mmodule.ec2_setup.module.validator[0].local_file.docker_compose_file: Refreshing state... [id=5e05df56cc9659da81aabf931a6127aae650349e]�[0m
2022-09-13T01:07:27.5149902Z �[0m�[1mmodule.ec2_setup.module.validator[0].null_resource.validator: Refreshing state... [id=298226890415546555]�[0m
2022-09-13T01:07:29.9063603Z �[33m
2022-09-13T01:07:29.9064379Z �[1m�[33mWarning: �[0m�[0m�[1mDeprecated Resource�[0m
2022-09-13T01:07:29.9064878Z 
2022-09-13T01:07:29.9066245Z �[0m  on ../basic_components/main.tf line 39, in data "aws_subnet_ids" "aoc_private_subnet_ids":
2022-09-13T01:07:29.9066882Z   39: data "aws_subnet_ids" "aoc_private_subnet_ids" �[4m{�[0m
2022-09-13T01:07:29.9184579Z �[0m
2022-09-13T01:07:29.9184958Z The aws_subnet_ids data source has been deprecated and will be removed in a
2022-09-13T01:07:29.9185421Z future version. Use the aws_subnets data source instead.
2022-09-13T01:07:29.9185617Z 
2022-09-13T01:07:29.9185745Z (and 3 more similar warnings elsewhere)
2022-09-13T01:07:29.9186033Z �[0m�[0m
2022-09-13T01:07:29.9186473Z �[33m
2022-09-13T01:07:29.9186825Z �[1m�[33mWarning: �[0m�[0m�[1mArgument is deprecated�[0m
2022-09-13T01:07:29.9187012Z 
2022-09-13T01:07:29.9187290Z �[0m  on ../ec2/sshkey.tf line 34, in data "aws_s3_bucket_object" "ssh_private_key":
2022-09-13T01:07:29.9187728Z   34:   bucket = �[4mvar.sshkey_s3_bucket�[0m
2022-09-13T01:07:29.9188001Z �[0m
2022-09-13T01:07:29.9188251Z Use the aws_s3_object data source instead
2022-09-13T01:07:29.9188529Z �[0m�[0m
2022-09-13T01:07:29.9188768Z �[0m�[1m�[32m
2022-09-13T01:07:29.9189144Z Apply complete! Resources: 0 added, 0 changed, 0 destroyed.�[0m
2022-09-13T01:07:29.9189461Z �[0m�[1m�[32m
2022-09-13T01:07:29.9189674Z Outputs:
2022-09-13T01:07:29.9189801Z 
2022-09-13T01:07:29.9189980Z testing_id = "532c503bffd9ffd4"�[0m
2022-09-13T01:07:29.9339527Z 
2022-09-13T01:07:29.9341097Z 
2022-09-13T01:07:29.9342699Z 
2022-09-13T01:07:30.5933469Z Command completed after 2 attempt(s).
2022-09-13T01:07:30.5935501Z

Retry a step that uses actions built in script method?

I have a step that runs a JS script using Actions built in script method.

Can this sort of step work with this retry action?

e.g.

- name: Step Name
        uses: actions/github-script@v6
        with:
          script: |
            const test = require(`./test-script.js`)
            await test({ github, context, core })

Running shell script is not compatible with way defaukt github actions behavior

Describe the bug
Running shell script is not compatible with way github actions shell script is invoked.

For example when running bash shell script on linux
The script executed as follows:
bash --noprofile --norc -eo pipefail {0}

This code script to fail on first error.
On retry action multi line script might have errors but if the last line succeed step wouldn't retry and error ignored.

Expected behavior
Keep behavior of default github action

Documentation on github action behavior: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsshell

Environment Variables not passed through

Describe the bug
Environment variables set on the workflow step (via the env keyword) are not passed through to the command when the workflow runs.

From my workflow:

- name: Destroy Infrastructure
  id: retry_on_error
  uses: nick-fields/retry@v2
  env:
    ENVIRONMENT: dev
    NODE_OPTIONS: '--max_old_space_size=7168'
  with:
    timeout_minutes: 60
    max_attempts: 3
    retry_wait_seconds: 60
    retry_on: error
    command: npx nx destroy ${{matrix.project}} --stage=${{ steps.git-branch.outputs.name }}

Expected behavior
Environment Variables set on the workflow step are used when invoking the command, which is the standard behaviour for the default run action, that this is meant to be a drop-in replacement for.

Screenshots
N/A

Logs
I can gather logs if necessary, but I feel like this is part feature request part bug report, so I'm not sure how useful logs would be.

Not retrying the command

Describe the bug
HI,
here is my test snippet

- name: Build with Maven
  uses: nick-invision/[email protected]
  with:
    timeout_seconds: 60
    max_attempts: 3
    retry_wait_seconds: 20
    retry_on: timeout
    command: mvn -Dmaven.wealthcentral.full.build=true clean install site

I tried timeout_seconds=60 timeout_minutes=1, still not working . I tried retry_on: error and it restarted the command, but I only wants to restart the command on timeout. Am I doing something wrong, or misunderstand the retry_on: timeout function ?

Expected behavior
I would expect the command to restart "mvn -Dmaven.wealthcentral.full.build=true clean install site". but I only get the error and the step fails.

Screenshots

Logs
https://pipelines.actions.githubusercontent.com/VZK7aeekU2uaNe4Ic693P6kodHwzQMznZWFFtaEw4BXgNmLWia/_apis/pipelines/1/runs/4111/signedlogcontent/3?urlExpires=2021-07-23T02%3A43%3A50.6523982Z&urlSigningMethod=HMACV1&urlSignature=q0FhuLAf1914%2FeQ%2FEOZJHeYaabiI5OlstxPwr3RiUvY%3D

Support multiline command

It looks like multiline command doesn't work

Is there any plan to support it

On multi-line, multi-command scripts, it won't fail unless the last command fails

Describe the bug

The first one works, the second one doesn't:

            - name: test1
              uses: nick-invision/retry@v2
              with:
                  timeout_minutes: 20 
                  max_attempts: 3
                  command: |
                      echo 'works'
                      pkgx12
                      echo 'works'
        
            - name: test2
              uses: nick-invision/retry@v2
              with:
                  timeout_minutes: 20
                  max_attempts: 3
                  command: |
                      echo 'works'
                      pkgx12 && echo 'works'

Expected behavior

Both should fail, or clarify such thing in the README.

Screenshots

Logs

2022-01-18T14:08:39.5054436Z ##[group]Run nick-invision/retry@v2
2022-01-18T14:08:39.5054733Z with:
2022-01-18T14:08:39.5054946Z   timeout_minutes: 20
2022-01-18T14:08:39.5055173Z   max_attempts: 3
2022-01-18T14:08:39.5055411Z   command: echo 'works'
pkgx12
echo 'works'

2022-01-18T14:08:39.5055672Z   retry_wait_seconds: 10
2022-01-18T14:08:39.5055914Z   polling_interval_seconds: 1
2022-01-18T14:08:39.5056144Z   warning_on_retry: true
2022-01-18T14:08:39.5056373Z   continue_on_error: false
2022-01-18T14:08:39.5056613Z ##[endgroup]
2022-01-18T14:08:39.6687396Z 
2022-01-18T14:08:39.6755118Z works
2022-01-18T14:08:39.6756988Z bash: line 1: pkgx12: command not found
2022-01-18T14:08:39.6758088Z works
2022-01-18T14:08:40.6753720Z Command completed after 1 attempt(s).
2022-01-18T14:08:40.6753983Z 
2022-01-18T14:08:40.6872658Z ##[group]Run nick-invision/retry@v2
2022-01-18T14:08:40.6872893Z with:
2022-01-18T14:08:40.6873096Z   timeout_minutes: 20
2022-01-18T14:08:40.6873312Z   max_attempts: 3
2022-01-18T14:08:40.6873560Z   command: echo 'works'
pkgx12 && echo 'works'

2022-01-18T14:08:40.6873799Z   retry_wait_seconds: 10
2022-01-18T14:08:40.6874032Z   polling_interval_seconds: 1
2022-01-18T14:08:40.6874272Z   warning_on_retry: true
2022-01-18T14:08:40.6874481Z   continue_on_error: false
2022-01-18T14:08:40.6874696Z ##[endgroup]
2022-01-18T14:08:40.7195864Z 
2022-01-18T14:08:40.7278695Z works
2022-01-18T14:08:40.7287107Z bash: line 1: pkgx12: command not found
2022-01-18T14:08:51.7328878Z ##[warning]Attempt 1 failed. Reason: Child_process exited with error code 127
2022-01-18T14:08:51.7337314Z 
2022-01-18T14:08:51.7339078Z works
2022-01-18T14:08:51.7344252Z bash: line 1: pkgx12: command not found
2022-01-18T14:09:02.7445366Z ##[warning]Attempt 2 failed. Reason: Child_process exited with error code 127
2022-01-18T14:09:02.7446598Z 
2022-01-18T14:09:02.7480505Z works
2022-01-18T14:09:02.7485492Z bash: line 1: pkgx12: command not found
2022-01-18T14:09:13.7544966Z ##[error]Final attempt failed. Child_process exited with error code 127
2022-01-18T14:09:13.7545802Z 
2022-01-18T14:09:13.7546558Z 
2022-01-18T14:09:13.7810761Z Evaluate and set job outputs
2022-01-18T14:09:13.7823667Z Cleaning up orphan processes

(they are simple raw logs)

Clarification on timeout_minutes & timeout_seconds

The README defines timeout_minutes and timeout_seconds as:

Minutes [or seconds] to wait before attempt times out.

Does "attempt" refer to each retry attempt (ie- a timeout per attempt), or the attempt to run this job as a whole? For example, if I want to retry something 3 times, and I set timeout_minutes to 1, does that mean:

There's a 1 minute timeout for each of the 3 attempts, and this could run for 3 minutes total.
There's a 1 minute timeout TOTAL, and all 3 attempts need to finish within 1 minute.

People on my team interpreted this differently. It could be helpful to clarify this in the documentation. Thanks!

Get EPERM error on timeout

Describe the bug
When timeout is reached, the action tries to kill the child process, but it throws an EPERM error

/home/runner/work/poco/poco/.github/actions/retry-action/dist/index.js:3353
            throw err;
            ^

Error: kill EPERM
    at process.kill (node:internal/process/per_thread:221:13)
    at killPid (/home/runner/work/poco/poco/.github/actions/retry-action/dist/index.js:3363:17)
    at /home/runner/work/poco/poco/.github/actions/retry-action/dist/index.js:3340:21
    at Array.forEach (<anonymous>)
    at /home/runner/work/poco/poco/.github/actions/retry-action/dist/index.js:3338:23
    at Array.forEach (<anonymous>)
    at killAll (/home/runner/work/poco/poco/.github/actions/retry-action/dist/index.js:3337:27)
    at /home/runner/work/poco/poco/.github/actions/retry-action/dist/index.js:3328:13
    at ChildProcess.onClose (/home/runner/work/poco/poco/.github/actions/retry-action/dist/index.js:3384:17)
    at ChildProcess.emit (node:events:513:28) {
  errno: -1,
  code: 'EPERM',
  syscall: 'kill'
}

Configuration:

      - uses: nick-fields/[email protected]
        with:
          timeout_minutes: 1
          max_attempts: 3
          retry_on: any
          command: >-
            sudo -s
            ./ci/runtests.sh TSAN

Expected behavior
Should not throw, and the command should be retried

Display logs of retry command

Describe the bug
I am using this action in my workflow for important network operation shell commands.
But when my command is wrapped around this action, I can't see any output logs of the command, which usually would be visible if run directly without this action.

Expected behavior
Shell command logs to be appearing in workflow logs.

retry_wait_seconds does not work

    - name: Create AWS Secrets
        if: steps.semantic.outputs.new-release-published == 'true'
        uses: nick-invision/retry@v1
        with:
          timeout_minutes: 1
          max_attempts: 60
          retry_wait_seconds: 5
          command: aws secretsmanager create-secret --name stripeSecretKey --secret-string ${STRIPE_KEY}```

Using it this way, but it retry's every second and does not wait

How to retry another GitHub Action?

If I have a GitHub Action for deploying content to Azure, how can I retry this deployment step if the deployment fails?

      - uses: nick-fields/retry@v2
        id: retry
        # see https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions#jobsjob_idcontinue-on-error
        continue-on-error: true
        with:
          timeout_seconds: 15
          max_attempts: 2
          retry_on: error
          command: node -e 'process.exit(99);'

      - name: 'Deploy to Azure Web App'
        id: deploy-to-webapp
        uses: azure/webapps-deploy@v2
        with:
          app-name: ${{ env.APP_NAME }}
          slot-name: 'Production'
          publish-profile: ${{ secrets.AZUREAPPSERVICE_PUBLISHPROFILE }}
          package: ${{ env.APP_DIR }}

Support Timeout only mode

The current system runs when failed or command is timed out.

I need to run only when timeout is occurred not on error.
Can this action have option like timeout_only?

sleep betwenn retries

It would be good if there would be a way to have some sleep time between retry.

Currently it just retries as fast as possible, but if the step fails because of some resource not available it is usually better to wait a bit before try it again.

nick-fields / retry Goto Github PK

retry's People

Contributors

Stargazers

Watchers

Forkers

retry's Issues

References:

Recommend Projects

Recommend Topics

Recommend Org