Comments (4)
I can confirm the server does put the flow run in a canceling state immediately upon the request from the UI on its end.
Repeatedly querying the server with the below code shows the above MRE enters the canceling state immediately upon being requested from the UI. However, this is not reflected in the run context, nor is the on_cancellation=
hook called until the flow run is termed. If this is intended, then the documentation ought to be revised as it currently states: 'on_cancellation=' An optional list of callables to run when the flow enters a cancelling state.
import asyncio
import time
from pprint import pformat, pprint
from prefect import get_client
from prefect.server.schemas.filters import (
FlowRunFilter,
FlowRunFilterState,
FlowRunFilterStateType,
)
from prefect.server.schemas.states import StateType
async def do():
async with get_client() as client:
while not time.sleep(1):
flow_runs = await client.read_flow_runs(
flow_run_filter=FlowRunFilter(
state=FlowRunFilterState(
type=FlowRunFilterStateType(any_=[StateType.CANCELLING]),
),
)
)
pprint(pformat([vars(fr) for fr in flow_runs], indent=4, width=1), indent=4)
if __name__ == "__main__":
asyncio.run(do())
To reproduce:
- Start a flow with the above MRE.
- Let it run for a few seconds, then spin up this state-check poll. You should see an empty list printed on the console every second.
- Cancel the MRE flow from the UI, and its info will be immediately and repeatedly printed to the poll console.
- The MRE will then be termed by prefect, and the console prints will return to an empty list.
- During this process observe the MRE's
print(context.flow_run.state)
will continue to printRunning()
until it is termed.
from prefect.
Hi @dakotahorstman, thanks for contributing your first issue with Prefect!
Prefect did not enter a 'Cancelling' state as indicated by the docs as the below output shows the flow run state remains at Running() until it is termed. The on_cancellation= hook is also not called until the flow is termed which is too late by that point.
When the flow run is no longer in a Running
state and moves to a Cancelling
state, it won't continue to print stmts as that would only happen while the flow is running. The on_cancellation
hook is triggered by the flow run entering a Cancelling
state, which is why you see that INFO-level log prior to the flow entering its terminal state of Cancelled
.
02:36:58.359 | INFO | Flow run 'aromatic-hippo' - Running hook '<lambda>' in response to entering state 'Cancelling'
cancelling!!!
02:36:58.360 | INFO | Flow run 'aromatic-hippo' - Hook '<lambda>' finished running successfully
02:36:58.389 | INFO | prefect.flow_runs.runner - Cancelled flow run 'aromatic-hippo'!
I see the flow run entering a Cancelling
state. Then the flow run state change hook is called and the expected print statement appears. Then we see that the hook completed. Then the flow run is moved to a Cancelled
state. Can you elaborate when you say "nor is the on_cancellation= hook called until the flow run is termed"?
from prefect.
Hi Serina,
I can elaborate, sure.
The flow is an infinite loop in the MRE I've provided here. This mimics our tasks.
Through the UI and the state poller I whipped up, we can see that the server's tracking of the flow run correctly places the run into a canceling state when requested. It is immediate, as would be expected.
However, this state change is never communicated to the worker or in a way that does not require polling the server (which, in our case, would be a major hindrance to performance). I would expect get_run_context
to reflect this state change ASAP so I could handle the request, but it does not. Additionally, the on_cancellation
hook is called after the flow has been termed (or at least my code is termed). It is not called before. Therefore, it is not possible for us to gracefully shut down the flow before termination.
For example, we could have a sentinel flag raised in the hook caught by our flow code and shut down properly.
According to the logs, which align with my observation, the SIGTERM is sent before the on_cancellation
hook is called. This is likely due to Prefect needing to transition through CANCELLING from RUNNING to reach CANCELED (I haven't checked the source to confirm, though). The ERROR print of the SIGTERM occurred 3 seconds before the hook was called--it should be printed well after the hook is run.
Here's how the state changes differ on the server and the worker according to my observations:
server
RUNNING (times N)
-> cancel requested from UI
-> CANCELLING (for timeout)
-> CANCELED
worker
RUNNING (times N)
-> cancel requested from UI (but never communicated)
-> RUNNING (times N)
-> proc termed
-> CANCELLING (one cycle)
-> CANCELED
This is what the logs should be:
(Ideally, we'd gracefully exit the flow before it is termed as soon as the run context returned a Cancelling state)
02:36:26.926 | INFO | prefect.flow_runs.runner - Runner 'runner-5099630a-bdda-417b-b8ed-93f5f9208f48' submitting flow run '69b965e9-ee3b-4d43-b8f1-08b5a6253c70'
02:36:27.161 | INFO | prefect.flow_runs.runner - Opening process...
02:36:27.163 | INFO | prefect.flow_runs.runner - Completed submission of flow run '69b965e9-ee3b-4d43-b8f1-08b5a6253c70'
<frozen runpy>:128: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
02:36:31.121 | INFO | Flow run 'aromatic-hippo' - Downloading flow code from storage at '.'
02:36:31.397 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:32.398 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:33.399 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:34.400 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:35.401 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:36.402 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:37.402 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:38.403 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:39.404 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:40.405 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:41.405 | INFO | Flow run 'aromatic-hippo' - Running()
02:36:42.406 | INFO | Flow run 'aromatic-hippo' - Running()
### Cancel request is sent around here ###
02:36:42.406 | INFO | Flow run 'aromatic-hippo' - Downloading flow code from storage at '.'
02:36:42.406 | INFO | Flow run 'aromatic-hippo' - Running hook '<lambda>' in response to entering state 'Cancelling'
cancelling!!!
02:36:42.407 | INFO | Flow run 'aromatic-hippo' - Hook '<lambda>' finished running successfully
02:36:43.407 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:44.407 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:45.408 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:46.409 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:47.409 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:48.410 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:49.411 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:50.411 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:51.412 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:52.413 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:53.414 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:54.415 | INFO | Flow run 'aromatic-hippo' - Cancelling()
02:36:55.347 | INFO | prefect.runner - Found 1 flow runs awaiting cancellation.
02:36:55.348 | ERROR | Flow run 'aromatic-hippo' - Crash detected! Execution was aborted by a termination signal.
02:36:55.373 | INFO | prefect.flow_runs.runner - Process for flow run 'aromatic-hippo' exited with status code: -15; This indicates that the process exited due to a SIGTERM signal. Typically, this is caused by manual cancellation.
02:36:58.389 | INFO | prefect.flow_runs.runner - Cancelled flow run 'aromatic-hippo'!
from prefect.
Hi @serinamarie
Just checking in on this ticket. Let me know if there are any questions or concerns I can answer for you :)
from prefect.
Related Issues (20)
- Add integrations test for `.deploy()`
- prefect-dbt: from_file method for PostgresTargetConfig
- prefect-dbt: from_file method for SnowflakeTargetConfigs
- prefect-dbt: from_file method for BigQueryTargetConfigs
- prefect-dbt: from_file method for TargetConfigs
- Prefect Markdown Artifacts: pass in a existing Markdown file or Jinja template
- Make `Task.serve` a sync-compatible coroutine
- Complete missing Task.submit() docstring sentence HOT 1
- Create a `PrefectDistributedFuture` to submit and track runs in task servers HOT 1
- Attempting to deploy with DeploymentImage results in "RuntimeError: This error is often thrown because Docker is not running." HOT 4
- docker storage deployment randomly fails `No such file or directory` HOT 2
- DBT integration not working with GCP application default credentials using SA impersonation
- Deployment Issues with from_source using LocalFileSystem Block HOT 5
- Add a `Task.defer` method HOT 1
- Calling `task_a.submit(task_b.submit())` should fail with an informative error with autonomous tasks
- MattermostWebhook does not send message to https server HOT 1
- Ensure infrastructure is always killed when job has crashed
- .serve() truncates a deployment name with two dots HOT 1
- Purge old Dask futures when no longer needed HOT 1
- `/create_flow_run` doesn't hydrate workspace variables HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prefect.