Git Product home page Git Product logo

Comments (4)

dakotahorstman avatar dakotahorstman commented on June 24, 2024

I can confirm the server does put the flow run in a canceling state immediately upon the request from the UI on its end.

Repeatedly querying the server with the below code shows the above MRE enters the canceling state immediately upon being requested from the UI. However, this is not reflected in the run context, nor is the on_cancellation= hook called until the flow run is termed. If this is intended, then the documentation ought to be revised as it currently states: 'on_cancellation=' An optional list of callables to run when the flow enters a cancelling state.

import asyncio
import time
from pprint import pformat, pprint

from prefect import get_client
from prefect.server.schemas.filters import (
    FlowRunFilter,
    FlowRunFilterState,
    FlowRunFilterStateType,
)
from prefect.server.schemas.states import StateType


async def do():
    async with get_client() as client:
        while not time.sleep(1):
            flow_runs = await client.read_flow_runs(
                flow_run_filter=FlowRunFilter(
                    state=FlowRunFilterState(
                        type=FlowRunFilterStateType(any_=[StateType.CANCELLING]),
                    ),
                )
            )
            pprint(pformat([vars(fr) for fr in flow_runs], indent=4, width=1), indent=4)


if __name__ == "__main__":
    asyncio.run(do())

To reproduce:

  1. Start a flow with the above MRE.
  2. Let it run for a few seconds, then spin up this state-check poll. You should see an empty list printed on the console every second.
  3. Cancel the MRE flow from the UI, and its info will be immediately and repeatedly printed to the poll console.
  4. The MRE will then be termed by prefect, and the console prints will return to an empty list.
  5. During this process observe the MRE's print(context.flow_run.state) will continue to print Running() until it is termed.

from prefect.

serinamarie avatar serinamarie commented on June 24, 2024

Hi @dakotahorstman, thanks for contributing your first issue with Prefect!

Prefect did not enter a 'Cancelling' state as indicated by the docs as the below output shows the flow run state remains at Running() until it is termed. The on_cancellation= hook is also not called until the flow is termed which is too late by that point.

When the flow run is no longer in a Running state and moves to a Cancelling state, it won't continue to print stmts as that would only happen while the flow is running. The on_cancellation hook is triggered by the flow run entering a Cancelling state, which is why you see that INFO-level log prior to the flow entering its terminal state of Cancelled.


02:36:58.359 | INFO    | Flow run 'aromatic-hippo' - Running hook '<lambda>' in response to entering state 'Cancelling'
cancelling!!!
02:36:58.360 | INFO    | Flow run 'aromatic-hippo' - Hook '<lambda>' finished running successfully
02:36:58.389 | INFO    | prefect.flow_runs.runner - Cancelled flow run 'aromatic-hippo'!

I see the flow run entering a Cancelling state. Then the flow run state change hook is called and the expected print statement appears. Then we see that the hook completed. Then the flow run is moved to a Cancelled state. Can you elaborate when you say "nor is the on_cancellation= hook called until the flow run is termed"?

from prefect.

dakotahorstman avatar dakotahorstman commented on June 24, 2024

Hi Serina,

I can elaborate, sure.

The flow is an infinite loop in the MRE I've provided here. This mimics our tasks.

Through the UI and the state poller I whipped up, we can see that the server's tracking of the flow run correctly places the run into a canceling state when requested. It is immediate, as would be expected.

However, this state change is never communicated to the worker or in a way that does not require polling the server (which, in our case, would be a major hindrance to performance). I would expect get_run_context to reflect this state change ASAP so I could handle the request, but it does not. Additionally, the on_cancellation hook is called after the flow has been termed (or at least my code is termed). It is not called before. Therefore, it is not possible for us to gracefully shut down the flow before termination.

For example, we could have a sentinel flag raised in the hook caught by our flow code and shut down properly.

According to the logs, which align with my observation, the SIGTERM is sent before the on_cancellation hook is called. This is likely due to Prefect needing to transition through CANCELLING from RUNNING to reach CANCELED (I haven't checked the source to confirm, though). The ERROR print of the SIGTERM occurred 3 seconds before the hook was called--it should be printed well after the hook is run.

Here's how the state changes differ on the server and the worker according to my observations:

server
RUNNING (times N) -> cancel requested from UI -> CANCELLING (for timeout) -> CANCELED

worker
RUNNING (times N) -> cancel requested from UI (but never communicated) -> RUNNING (times N) -> proc termed -> CANCELLING (one cycle) -> CANCELED

This is what the logs should be:
(Ideally, we'd gracefully exit the flow before it is termed as soon as the run context returned a Cancelling state)

02:36:26.926 | INFO    | prefect.flow_runs.runner - Runner 'runner-5099630a-bdda-417b-b8ed-93f5f9208f48' submitting flow run '69b965e9-ee3b-4d43-b8f1-08b5a6253c70'
02:36:27.161 | INFO    | prefect.flow_runs.runner - Opening process...
02:36:27.163 | INFO    | prefect.flow_runs.runner - Completed submission of flow run '69b965e9-ee3b-4d43-b8f1-08b5a6253c70'
<frozen runpy>:128: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
02:36:31.121 | INFO    | Flow run 'aromatic-hippo' - Downloading flow code from storage at '.'
02:36:31.397 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:32.398 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:33.399 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:34.400 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:35.401 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:36.402 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:37.402 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:38.403 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:39.404 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:40.405 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:41.405 | INFO    | Flow run 'aromatic-hippo' - Running()
02:36:42.406 | INFO    | Flow run 'aromatic-hippo' - Running()
### Cancel request is sent around here ###
02:36:42.406 | INFO    | Flow run 'aromatic-hippo' - Downloading flow code from storage at '.'
02:36:42.406 | INFO    | Flow run 'aromatic-hippo' - Running hook '<lambda>' in response to entering state 'Cancelling'
cancelling!!!
02:36:42.407 | INFO    | Flow run 'aromatic-hippo' - Hook '<lambda>' finished running successfully
02:36:43.407 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:44.407 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:45.408 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:46.409 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:47.409 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:48.410 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:49.411 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:50.411 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:51.412 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:52.413 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:53.414 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:54.415 | INFO    | Flow run 'aromatic-hippo' - Cancelling()
02:36:55.347 | INFO    | prefect.runner - Found 1 flow runs awaiting cancellation.
02:36:55.348 | ERROR   | Flow run 'aromatic-hippo' - Crash detected! Execution was aborted by a termination signal.
02:36:55.373 | INFO    | prefect.flow_runs.runner - Process for flow run 'aromatic-hippo' exited with status code: -15; This indicates that the process exited due to a SIGTERM signal. Typically, this is caused by manual cancellation.
02:36:58.389 | INFO    | prefect.flow_runs.runner - Cancelled flow run 'aromatic-hippo'!

from prefect.

dakotahorstman avatar dakotahorstman commented on June 24, 2024

Hi @serinamarie

Just checking in on this ticket. Let me know if there are any questions or concerns I can answer for you :)

from prefect.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.