Schedule Conductor workflow is a scheduler as a service that runs in the cloud with Netflix conductor embedded in it. It runs as an extension module of conductor.
PUT /scheduling/metadata/scheduleWf/{name} does not shutdown scheduler job on all servers.
Currently, this API marks scheduling metadata for a shutdown.
DefaultSchedulerManager reads to be shutdown metadata definitions.
cancel job applicable and clean up metadata.
In a cluster environment, metadata gets removed from database by the node which cancels the jobs first. The other node, then does not find metadata applicable for shutdown hence job keeps on running on other leftover nodes.
Proposed Fix:
Do not delete scheduled wf metadata definition after scheduled job cancel.
Add rescheduling capability in CronBasedWorkflowScheduler.
Meta definition will persist in DB even after cancellation of scheduled job (on each node of the conductor cluster).
SHUTDOWN and DELETE status of scheduled workflow metadata definition are complimentary to each other after this fix.
As of now, we do not need metadata definition clean-up. We will introduce DELETE API for metadata definition in the future major releases.
While working on the PR with @vivianzheng404 on the PR - #32 we realised that the conductor has introduced a priority parameter within the workflow start method. In PR #32 this has been hard coded with value=1. Ideally, this has to be consumed from the scheduled workflow definition. This ticket is being opened to implement after release-3.0.0.
Refer to conversation - https://github.com/jas34/scheduledwf/pull/32/files#r1251489871
As of today Scheduler Management offers only three APIs:
GET /scheduler/managers - returns list of scheduler managers
GET /scheduler/scheduled/workflows - returns a list of scheduled workflows based on (name and managerId and nodeAddress) or schedulerId
GET /scheduler/scheduled/workflows/executions - returns a list of scheduled job executions based on (name and managerId and nodeAddress) or schedulerId
New APIs required for easy navigation by admin:
GET /scheduler/managers/nodes - returns a list of nodes on which the scheduler manager is currently running.
GET /scheduler/scheduled/jobs - returns a list of jobs currently running on each node of the cluster.
GET /scheduler/scheduled/jobs/{nodeAddress} - returns a list of jobs currently running on a node of the cluster.
While working on the PR with @vivianzheng404 on the PR - #32, we realised that the conductor has changed IDGenerator from static class to a spring bean (Ref Netflix/conductor#2910). For release-3.0.0 we have decided to go with a static class adapted in scheduledwf as IDGenerator_.java (at scheduledwf-core/src/main/java/io/github/jas34/scheduledwf/utils/IDGenerator_.java).
This ticket will deprecate IDGenerator_ and we will move to consume IDGenerator bean from conductor-core. This ticket is being opened to implement after release-3.0.0.
Refer to conversation - https://github.com/jas34/scheduledwf/pull/32/files#r1250839746
One issue is coming while running scheduleConductor as a module with our own conductor. Issue is that I cannot add new scheduleWorkflow definition because it says it is a validation error.
But, when I run scheduleConductor as a service, then I am able to insert new scheduleWorkflow definition in db.
On Investigating it is found that, scheduleConductor running as a service uses apache-bval-jsr as a Validation Implementation class for javax validation.
But, our Conductor uses hibernate validator as a Validation Implementation class for javax validation.
It says Validation failed due to below Exception.
"HV000030: No validator could be found for constraint 'javax.validation.constraints.NotEmpty' validating type 'io.github.jas34.scheduledwf.metadata.ScheduleWfDef$Status'
Also, In ScheduleWfDef DTO @NotEmpty is added on Status enum instance variable. As per @NotEmpty Validation@NotEmpty is for CharSequence, Collection, Map and Array. but not for enums.
ScheduleWf conductor is able to byPass this check since it is using apache-bval-jsr for validation which is a dependency present in conductor-common.
I am exploring the possibility of having scheduledwf integrated with our Conductor build as a module. And we use PostgreSQL as the main persistence engine for Conductor, do you have any guide / docs that can help with the proper setup / configuration and build?
Following exception can be observed whenever com.coreoz.wisp.Scheduler tries to schedule a job.
error message Error during job '{job-name}' execution
java.lang.RuntimeException: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance ofio.github.jas34.scheduledwf.concurrent.Permit(although at least one Creator exists): cannot deserialize from Object value (no delegate- or property-based Creator) at [Source: (String)"{"id":"b71b9f87-85f7-4d8f-a560-677405b21771","name":"{job-name}","inUseUpto":1624068000507,"used":false}"; line: 1, column: 2] at io.github.jas34.scheduledwf.concurrent.RedisPermitDAO.readValue(RedisPermitDAO.java:69) at io.github.jas34.scheduledwf.concurrent.RedisPermitDAO.fetchByName(RedisPermitDAO.java:45) at io.github.jas34.scheduledwf.concurrent.ExecutionPermitter.giveBack(ExecutionPermitter.java:57) at io.github.jas34.scheduledwf.concurrent.LockingService.releaseLock(LockingService.java:55) at io.github.jas34.scheduledwf.scheduler.TriggerScheduledWorkFlowTask.run(TriggerScheduledWorkFlowTask.java:67) at com.coreoz.wisp.Scheduler.runJob(Scheduler.java:482) at com.coreoz.wisp.Scheduler.lambda$launcher$0(Scheduler.java:451) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot construct instance ofio.github.jas34.scheduledwf.concurrent.Permit(although at least one Creator exists): cannot deserialize from Object value (no delegate- or property-based Creator) at [Source: (String)"{"id":"b71b9f87-85f7-4d8f-a560-677405b21771","name":"url-redirect-restore-cache","inUseUpto":1624068000507,"used":false}"; line: 1, column: 2] at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:63) at com.fasterxml.jackson.databind.DeserializationContext.reportInputMismatch(DeserializationContext.java:1429) at com.fasterxml.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1059) at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1297) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326) at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159) at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4202) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3205) at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3173) at io.github.jas34.scheduledwf.concurrent.RedisPermitDAO.readValue(RedisPermitDAO.java:67) ... 9 more
Currently CronBasedScheduledProcess<Job> is denoted by generic param. We should have fixed signature for CronBasedScheduledProcess having ScheduledProcess<Job> where Job = com.coreoz.wisp.Job
Release v1.0.0 contains conductor-server-all module in the scheduledwf-core module.
There can be two types of usages of scheduled workflow module:
users who want to use scheduling capability from scheduledwf because they have their own fork of conductor server. For them, we need a module that comes with bare minimum dependencies of:
conductor-core
conductor-mysql
conductor-jersey
users who do not use any fork of conductor but directly run conductor either as a jar build or docker image. They can keep using scheduledwf-server module.