Git Product home page Git Product logo

Comments (6)

wxtim avatar wxtim commented on August 19, 2024

Brain dump -

Put a breakpoint at cylc/flow/task_state.py::483 and looked at the dependencies created for the new task created by Cylc Trigger:

(Pdb) cpre.api_dump() 
expression: "c0"
conditions {
  task_proxy: "1590/fin"
  expr_alias: "c0"
  req_state: "succeeded"
  satisfied: false
  message: "unsatisfied"
}
cycle_points: "1590"
satisfied: false

Stack

  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/commands.py(448)force_trigger_tasks()
-> yield schd.pool.force_trigger_tasks(tasks, flow, flow_wait, flow_descr)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(2208)force_trigger_tasks()
-> self.add_to_pool(itask)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(228)add_to_pool()
-> self.create_data_store_elements(itask)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_pool.py(239)create_data_store_elements()
-> self.data_store_mgr.increment_graph_window(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/data_store_mgr.py(955)increment_graph_window()
-> self.generate_ghost_task(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/data_store_mgr.py(1188)generate_ghost_task()
-> itask = TaskProxy(
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_proxy.py(281)__init__()
-> self.state = TaskState(tdef, self.point, status, is_held)
  /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_state.py(238)__init__()
-> self._add_prerequisites(point, tdef)
> /net/home/h02/tpilling/metomi/cylc-flow/cylc/flow/task_state.py(484)_add_prerequisites()
  • When restarting the workflow ...
  • When generating a triggering generates task pre-requisites it creates them unsatisfied (fair enough).
  • The unsatisfied pre-reqs are never satisfied.

from cylc-flow.

oliver-sanders avatar oliver-sanders commented on August 19, 2024

I think this may be a duplicate of #5952

The behaviour is sort of "intended" under SoD logic, however, totally illogical from the graph perspective.

from cylc-flow.

wxtim avatar wxtim commented on August 19, 2024

I think this may be a duplicate of #5952

It is.

from cylc-flow.

hjoliver avatar hjoliver commented on August 19, 2024

The behaviour is sort of "intended" under SoD logic, however, totally illogical from the graph perspective.

Meh, reading back through that issue, there is nothing illogical about it. The root of the problem (in that issue at least; I haven't read this one closely) is that you made a change on restart that affected the structure of the graph - by adding new dependencies on an old (previously succeeded) task.

What happens next depends entirely on whether we think graph execution should be purely event-driven (prerequisites get satisfied when outputs get completed) or not (prerequisites can also get satisfied by output events that happened any time in the past).

As it happens I also think we should try to implement the latter way, although it's not exactly easy (a lot more DB lookups...). But I suspect it can be argued either way, and neither way is "illogical".

from cylc-flow.

oliver-sanders avatar oliver-sanders commented on August 19, 2024

This behaviour flows on from our current implementation of the SoD model, however, this is an implementation choice, not a required (or desired) characteristic of the SoD model so it can only be considered "logical" from the perspective of the implementation, not the model. Under the current implementation, Cylc can tell you that a task which previously succeeded has not run at all, ignoring its prior history which is a bug.

The SoD implementation is indeed event driven, it's just that those events are not entirely held in memory. We are already looking up outputs every time a task spawns (to prevent "over-running"), we do not currently lookup prerequisites (to prevent "under-running") leading to the bug (which is inconsistent).

Looking up prerequisites (in addition to outputs) need not add any extra sqlite calls but will increase the "volume" of the calls we are already performing. The performance impacts of this will need to be measured to be determined, however, since the bulk of the heavy lifting is performed at the C layer, I'm cautiously optimistic. To optimise, we can shift as much logical as possible into the SQL query, one optimisation might be to only return prerequisites which are satisfied.

from cylc-flow.

hjoliver avatar hjoliver commented on August 19, 2024

We are already looking up outputs every time a task spawns (to prevent "over-running"), we do not currently lookup prerequisites (to prevent "under-running") leading to the bug (which is inconsistent).

The former is an absolute necessity, without which the model would spawn multiple flows at every graph join (a & b => c). The latter is at least arguable, and it only matters in relatively niche situations. If I add new dependence on old event, it's not entirely unreasonable to expect that I might have fake the old event again to trigger the new dependence on it - because we missed the original event.

This is not merely implementation - the conceptual model itself can be event driven in the instantaneous sense, or with the addition of remembering and re-using past events.

When implementing SoD I did think about this, and settled on what we've got because (a) it is easier not to look up old outputs if we don't have to, and unlike for the "overrun" case it is not an absolute necessity; but also (b) it can be justified if you favour an instantaneous event-driven model.

That said, I'm being a bit bloody-minded just to object to characterizing the current situation as "illogical". I do agree that we should change it to remember and re-use past output events.

from cylc-flow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.