Git Product home page Git Product logo

uber / cadence Goto Github PK

View Code? Open in Web Editor NEW
8.0K 1.5K 777.0 68.64 MB

Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.

Home Page: https://cadenceworkflow.io

License: MIT License

Makefile 0.22% Go 99.65% Shell 0.10% Dockerfile 0.03%
uber cadence workflows orchestration-engine workflow-automation distributed-systems service-bus service-fabric services-platform golang

cadence's Introduction

Cadence

Build Status Coverage Slack Status

Github release License

GitHub stars GitHub forks

This repo contains the source code of the Cadence server and other tooling including CLI, schema tools, bench and canary.

You can implement your workflows with one of our client libraries. The Go and Java libraries are officially maintained by the Cadence team, while the Python and Ruby client libraries are developed by the community.

You can also use iWF as a DSL framework on top of Cadence.

See Maxim's talk at Data@Scale Conference for an architectural overview of Cadence.

Visit cadenceworkflow.io to learn more about Cadence. Join us in Cadence Documentation project. Feel free to raise an Issue or Pull Request there.

Community

  • Github Discussion
    • Best for Q&A, support/help, general discusion, and annoucement
  • StackOverflow
    • Best for Q&A and general discusion
  • Github Issues
    • Best for reporting bugs and feature requests
  • Slack
    • Best for contributing/development discussion

Getting Started

Start the cadence-server

To run Cadence services locally, we highly recommend that you use Cadence service docker to run the service. You can also follow the instructions to build and run it.

Please visit our documentation site for production/cluster setup.

Run the Samples

Try out the sample recipes for Go or Java to get started.

Cadence CLI can be used to operate workflows, tasklist, domain and even the clusters.

You can use the following ways to install Cadence CLI:

  • Use brew to install CLI: brew install cadence-workflow
    • Follow the instructions if you need to install older versions of CLI via homebrew. Usually this is only needed when you are running a server of a too old version.
  • Use docker image for CLI: docker run --rm ubercadence/cli:<releaseVersion> or docker run --rm ubercadence/cli:master . Be sure to update your image when you want to try new features: docker pull ubercadence/cli:master
  • Build the CLI binary yourself, check out the repo and run make cadence to build all tools. See CONTRIBUTING for prerequisite of make command.
  • Build the CLI image yourself, see instructions

Cadence CLI is a powerful tool. The commands are organized by tabs. E.g. workflow->batch->start, or admin->workflow->describe.

Please read the documentation and always try out --help on any tab to learn & explore.

Use Cadence Web

Try out Cadence Web UI to view your workflows on Cadence. (This is already available at localhost:8088 if you run Cadence with docker compose)

Contributing

We'd love your help in making Cadence great. Please review our contribution guide.

If you'd like to propose a new feature, first join the Slack channel to start a discussion and check if there are existing design discussions. Also peruse our design docs in case a feature has been designed but not yet implemented. Once you're sure the proposal is not covered elsewhere, please follow our proposal instructions.

Other binaries in this repo

Bench/stress test workflow tools

See bench documentation.

Periodical feature health check workflow tools(aka Canary)

See canary documentation.

Schema tools for SQL and Cassandra

The tools are for manual setup or upgrading database schema

The easiest way to get the schema tool is via homebrew.

brew install cadence-workflow also includes cadence-sql-tool and cadence-cassandra-tool.

  • The schema files are located at /usr/local/etc/cadence/schema/.
  • To upgrade, make sure you remove the old ElasticSearch schema first: mv /usr/local/etc/cadence/schema/elasticsearch /usr/local/etc/cadence/schema/elasticsearch.old && brew upgrade cadence-workflow. Otherwise ElasticSearch schemas may not be able to get updated.
  • Follow the instructions if you need to install older versions of schema tools via homebrew. However, easier way is to use new versions of schema tools with old versions of schemas. All you need is to check out the older version of schemas from this repo. Run git checkout v0.21.3 to get the v0.21.3 schemas in the schema folder.

Stargazers over time

Stargazers over time

License

MIT License, please see LICENSE for details.

cadence's People

Contributors

3vilhamster avatar abhishekj720 avatar agautam478 avatar andrewjdawson2016 avatar anish531213 avatar bowenxia avatar davidporter-id-au avatar demirkayaender avatar groxx avatar jakobht avatar ketsiambaku avatar longquanzheng avatar mantas-sidlauskas avatar meiliang86 avatar mkolodezny avatar neil-xie avatar samarabbas avatar sankari165 avatar shaddoll avatar shreyassrivatsan avatar sivakku avatar taylanisikdemir avatar timl3136 avatar vancexu avatar venkat1109 avatar vytautas-karpavicius avatar wxing1292 avatar yiminc avatar yux0 avatar yycptt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cadence's Issues

History cache invalidation on Cassandra timeouts

we have an issue where if we got a timeout error while updating the wf mutable state, we couldn't guarantee that we read the correct, latest state on reload. This is because the write could still be applied after executing the read.
This could have lead to corrupting the Events table if we tried to use the stale next_event_id value for subsequent writes.

DeleteWorkflowExecution is not transactional with workflow completion update

When decider responds back with complete workflow decision, we first update the execution with new events and then delete workflow execution as a separate transaction. This can cause issues when the update times out but we successfully apply the update. This can cause us to never delete this workflow execution.
We need to make sure that execution is update and deleted in the same transaction.

Matching Engine can lose decision tasks

By design, the matching engine can lose tasks even before recording in the execution history that they started. This is OK for activity tasks, since there are always timeouts for them.
On the other hand, there is no ScheduledToStart timeout for decision tasks (to avoid unnecessary timeouts in case decider was down or not polling tasks). If the decision task is lost, the workflow execution will get stuck forever.

Matching Service should not serve requests before it's ready

The matching service registers its thrift handler and starts the TChannel RPC server before it is fully initialized. We see issues where if the requests reach matching engine before it's properly initialized it will go into panics.
We need to handle this similar to the way we handle History Service, where we block incoming requests on a wait group until the initialization of the service is complete.

Design task to expose mutable state to client-side

Certain workflows are easy to write if mutable state is exposed directly to client for making decisions instead of history. Workflows like cron will prefer this model and it is much more optimized for such scenarios. Also using mutable state for things like activity retries are much preferable rather than having client implement the retry logic.

History Service: Fix timer task creation on activity heartbeat

History service seems to be creating a timeout task on each heartbeat. Instead we should have last hearbeat time recorded in mutable state and only create new timeout when the first one expires based on the last value for last recorded heartbeat.

Matching Engine: Rate limit creation of new tasks for any TaskList

Every TaskList is mapped to single cassandra partition. So if we have all shards writing events to single TaskList, than it becomes the scalability bottleneck for the system. If Sync matching is not happening we end up writing all the tasks to cassandra and under lots of load cassandra transactions start timing out. This behavior ends up in generating very large number of duplicate tasks.
I think we need to put a rate limiter on TaskList to prevent this situation from happening. We should just return a throttle error back to client, and have the client backoff and retry failures. This should cause the system to degrade gracefully under extreme load.

History Engine: Timer optimizations

Currently all timers are created on each activity and decision task. We need to implement the logic to create a single timer for each workflow execution and set the next earliest timer when that one fires.

History Service: Mutable state API cleanup

Currently mutable state is only used for small part of the API. This work item is created to track it is used for all API calls on History service and is updated to keep track of all relevant information like:

  1. ActivityInfos
  2. TimerInfos
  3. OutstandingDecision
  4. NextEventID
  5. ChildWorkflows
  6. Potentially any Signal ID if it makes sense for any API

Create ActivityTaskScheduleFailed event in history on bad decisions

If RespondDecisionTask sends in bad request or corrupted data than we just silently ignore the activitySchedule decisions. Instead we need to add relevant failure like ActivityTaskScheduleFailed event and then also create a new DecisionTask for the decider. Here is an instance of the failure:
{"RunID":"c09c5b10-d240-4f8b-bc4c-5735c0bb3805","ScheduleID":212,"Service":"cadence-frontend","WorkflowID":"48018f57-0c39-4d4e-b055-e3df3fff7464","level":"error","msg":"RespondDecisionTaskCompleted. Error: BadRequestError({Message:Missing StartToCloseTimeoutSeconds in the activity scheduling parameters.})","time":"2017-03-07T13:56:56-08:00"}

Deletion of history events on workflow completion

We mark the workflow execution row with a TTL in executions table on completion. This takes care of workflow execution entry in executions table but we still need leak space in the events table as we don't cleanup the history associated with that execution.
We could use the timer queue processor for this purpose and queue up a timer task to delete the execution history after retention period.

Basic Server Side Throttling

Cadence is a multi-tenant service and we need to protect against single bad user bringing the entire system down. This task is to implement basic throttling and quotas for each client.

Handling of history corruption

Hopefully, execution history should never get corrupted. If, for any reason (bugs?) we get into a state where this happens we should not just return a retriable error to the callers.

Frontend should not serve requests before it's ready

The frontend service registers its thrift handler and starts the TChannel RPC server before it is fully initialized. We see issues where if the requests reach frontend before it's properly initialized it will go into panics.
We need to handle this similar to the way we handle history and matching services where we block incoming requests on a wait group until the initialization of the service is complete.

Create Lock Manager to serialize access to executions

Right now, every request gets a WorkflowExecutionContext from the cache and then acquires a lock on that object. It is possible in edge conditions that two requests end up with two different context objects (request 1 gets the context, the context gets evicted from the cache, then request 2 creates a new object). This will break the guarantee that only one write per execution originates from the history engine at a time.
We can fix this by having a central lock manager that grants locks on executions instead of locking the context object itself.

History client support for host redirect

Now we have support for returning the correct host information when API calls to history service fails with ShardOwnershipLostError.
History client needs to look into the ShardOwnershipLostError and retry the request given the host information as part of the error.

Cadence Feature: Restart failed workflows

This feature is to support restarting workflows from a given point in workflow execution history. Basically you want to preserve the history of an execution up to a point and restart from that location. Very useful when workflow fails due to a bug at a certain point and you want to restart a workflow after fixing the bug.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.