Git Product home page Git Product logo

flyteorg / flyte Goto Github PK

View Code? Open in Web Editor NEW
4.8K 259.0 514.0 300.68 MB

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

Home Page: https://flyte.org

License: Apache License 2.0

Makefile 0.28% Shell 0.85% Python 0.95% Dockerfile 0.06% Mustache 0.16% Go 97.53% Smarty 0.13% HTML 0.01% Batchfile 0.01% Rust 0.01%
flyte machine-learning golang scale workflow data-science data-analysis data kubernetes-operator kubernetes orchestration-engine mlops dataops grpc python production production-grade declarative fine-tuning llm

flyte's Introduction

Flyte and LF AI & Data Logo

Flyte

πŸ—οΈ πŸš€ πŸ“ˆ

Current Release label Sandbox Status label Test Status label License label OpenSSF Best Practices label Flyte Helm Chart label Flyte Slack label

Flyte is an open-source orchestrator that facilitates building production-grade data and ML pipelines. It is built for scalability and reproducibility, leveraging Kubernetes as its underlying platform. With Flyte, user teams can construct pipelines using the Python SDK, and seamlessly deploy them on both cloud and on-premises environments, enabling distributed processing and efficient resource utilization.

Build

Write code in Python or any other language and leverage a robust type engine.

Getting started with Flyte

Deploy & Scale

Either locally or on a remote cluster, execute your models with ease.

Getting started with Flyte

Table of contents


Quick start

  1. Install Flyte's Python SDK
pip install flytekit
  1. Create a workflow (see example)
  2. Run it locally with:
pyflyte run hello_world.py hello_world_wf

Ready to try a Flyte cluster?

  1. Create a new sandbox cluster, running as a Docker container:
flytectl demo start
  1. Now execute your workflows on the cluster:
pyflyte run --remote hello_world.py hello_world_wf

Getting started with Flyte, showing the welcome screen and Flyte dashboard

Do you want to see more but don't want to install anything?

Head over to https://sandbox.union.ai/. It allows you to experiment with Flyte's capabilities from a hosted Jupyter notebook.

Ready to productionize?

Go to the Deployment guide for instructions to install Flyte on different environments

Tutorials

Features

πŸš€ Strongly typed interfaces: Validate your data at every step of the workflow by defining data guardrails using Flyte types.
🌐 Any language: Write code in any language using raw containers, or choose Python, Java, Scala or JavaScript SDKs to develop your Flyte workflows.
πŸ”’ Immutability: Immutable executions help ensure reproducibility by preventing any changes to the state of an execution.
🧬 Data lineage: Track the movement and transformation of data throughout the lifecycle of your data and ML workflows.
πŸ“Š Map tasks: Achieve parallel code execution with minimal configuration using map tasks.
🌎 Multi-tenancy: Multiple users can share the same platform while maintaining their own distinct data and configurations.
🌟 Dynamic workflows: Build flexible and adaptable workflows that can change and evolve as needed, making it easier to respond to changing requirements.
⏯️ Wait for external inputs before proceeding with the execution.
🌳 Branching: Selectively execute branches of your workflow based on static or dynamic data produced by other tasks or input data.
πŸ“ˆ Data visualization: Visualize data, monitor models and view training history through plots.
πŸ“‚ FlyteFile & FlyteDirectory: Transfer files and directories between local and cloud storage.
πŸ—ƒοΈ Structured dataset: Convert dataframes between types and enforce column-level type checking using the abstract 2D representation provided by Structured Dataset.
πŸ›‘οΈ Recover from failures: Recover only the failed tasks.
πŸ” Rerun a single task: Rerun workflows at the most granular level without modifying the previous state of a data/ML workflow.
πŸ” Cache outputs: Cache task outputs by passing cache=True to the task decorator.
🚩 Intra-task checkpointing: Checkpoint progress within a task execution.
⏰ Timeout: Define a timeout period, after which the task is marked as failure.
🏭 Dev to prod: As simple as changing your domain from development or staging to production.
πŸ’Έ Spot or preemptible instances: Schedule your workflows on spot instances by setting interruptible to True in the task decorator.
☁️ Cloud-native deployment: Deploy Flyte on AWS, GCP, Azure and other cloud services.
πŸ“… Scheduling: Schedule your data and ML workflows to run at a specific time.
πŸ“’ Notifications: Stay informed about changes to your workflow's state by configuring notifications through Slack, PagerDuty or email.
βŒ›οΈ Timeline view: Evaluate the duration of each of your Flyte tasks and identify potential bottlenecks.
πŸ’¨ GPU acceleration: Enable and control your tasks’ GPU demands by requesting resources in the task decorator.
🐳 Dependency isolation via containers: Maintain separate sets of dependencies for your tasks so no dependency conflicts arise.
πŸ”€ Parallelism: Flyte tasks are inherently parallel to optimize resource consumption and improve performance.
πŸ’Ύ Allocate resources dynamically at the task level.

Who's using Flyte

Join the likes of LinkedIn, Spotify, Freenome, Pachama, Warner Bros. and many others in adopting Flyte for mission-critical use cases. For a full list of adopters and information on how to add your organization or project, please visit our ADOPTERS page.

How to stay involved

πŸ“† Weekly office hours: Live informal sessions with the Flyte team held every week. Book a 30-minute slot and get your questions answered.
πŸ‘₯ Monthly community sync: Happening the first Tuesday of every month, this is where the Flyte team provides updates on the project, and community members can share their progress and ask questions.
πŸ’¬ Slack: Join the Flyte community on Slack to chat with other users, ask questions, and get help.
⚠️ Newsletter: join this group to receive the Flyte Monthly newsletter.
πŸ“Ή Youtube: Tune into panel discussions, customer success stories, community updates and feature deep dives.
πŸ“„ Blog: Here, you can find tutorials and feature deep dives to help you learn more about Flyte.
πŸ’‘ RFCs: RFCs are used for proposing new ideas and features to improve Flyte. You can refer to them to stay updated on the latest developments and contribute to the growth of the platform.

How to contribute

There are many ways to get involved in Flyte, including:

We ❀️ our contributors

953358370901257597118271592984394310830562888811578108056158892184082372896568151852437936015277247631688870927777173452166474898565628986239450880580350323567779831223062603653394572570781228521086981538806453936213261742139142716548702118151754698929928166891937524160650517098894349093160909764361047153313394496745846792724868813302333125543414728265026554345877981179967195110820382072088200209243648304358781916389936511035227846541316881177744713605291042572538760119702581973368315508713994552110974449699333181059180421934393965996099861221940598349643165094906774758183378073226241168692913854113062123054334265487366561716500430375389975434806901302711670020642248654333860700576540257714896664777167782230138253138103843962286288302231071921218666947358951336529176984748488594201737393449803919853373103761951251051153481371700632614101430853310201242112692566122863313070236881763917309187268346585092599579670316716684317784075181406337416211245063235151789133317241193451861521126732473595086045365068105571814365977800661030084735036857538281015793222695370910587696225038146886843729944195813880715877000120668349100569684173517646399428256953021836330140230153451399253644905067987175488237515359319823956220965018921752004011377022223806652473994910430635319111751041527932781860964865843214593596231091927689726061478720441052299714054801004789826048414806211335881100597998483070069161722433367672539117366388192149921891175392747594611796986304786244025051659415869111421145677374901991034518429532638140089789272376163346030335921922904936594527875586965284497907255769683682749213393610438373332725875803281027207165266271052654053467642291774140143026771971263003151068153661568889937967114232404149968689343819010430512195344216461847646765941174730201354781518414436899736989112714477269703328455404056828107115357320474710217128419026265392850217804851253032755936685311318345091078931324225130002224465646729471486202804703533182199429103263313130875210976393625318519037413772114784116739793833655626670894866754748863181308139027713919823880001770392617851051119318734341672417330872419419101450450788414709014796400253353644124210778098651937811132451016594730508330361604708107592392948148147854100517263167280758916837146386887615811751844696692852310113228575146424054105082160113356122339164210247627390781496808752486650526115494996105856558378320753165133823807434118945041823250369170839638848382966451733751510090384230295799570725486791506754953085803627753472562559728279238894122166981983649055879913779418982821856758026092524

License

Flyte is available under the Apache License 2.0. Use it wisely.

flyte's People

Contributors

akhurana001 avatar anandswaminathan avatar andrewwdye avatar bnsblue avatar byronhsu avatar chanadian avatar cosmicbboy avatar davidmirror-ops avatar ddl-ebrown avatar eapolinario avatar enghabu avatar flyte-bot avatar future-outlier avatar goreleaserbot avatar hamersaw avatar honnix avatar jeevb avatar katrogan avatar kumare3 avatar mayitbeegh avatar migueltol22 avatar neverett avatar pingsutw avatar pmahindrakar-oss avatar samhita-alla avatar sandragh5 avatar smritisatyanv avatar surindersinghp avatar wild-endeavor avatar yindia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flyte's Issues

Parallel Node (Propeller Side)

TCS are excited for the native parallelization offered in Flyte 2.0. This task is for the propeller side execution of parallel nodes.

Expanded error message collapses when scrolling out of view

  • Find an execution in the executions table (workflow details page) that has a long error message.
  • Click to expand the error message.
  • Scroll the row out of view
  • Scroll the row back into view

Expected: The error message should still be expanded.

Actual: The error message renders collapsed, but the row is still the size that it would be with the error message expanded. Now the content sits in the middle of a row that is too tall.

Allow download of Inputs / Outputs

It's unclear exactly what format things should be in, but for I/O types like CSV/Blob/Schema we should be able to provide a download link for the user.

Options:

  1. Convert it to a signed S3 link. This is probably not the right move because we need to verify the identity of a user before allowing them to download
  2. Convert the s3:// protocol to an actual s3 link. It would be up to the user to ensure they are assuming the correct role to be able to download the file.

Likely it will be option 2.

For things like CSV list, we have to consider how to display a list of these items.

Console sends `undefined` instead of `false` for unchecked toggle switches

For workflows which take boolean values, the Console renders a toggle switch. When the toggle remains switched to "off", the resulting computed value is undefined instead of false. This translated to passing no value for the input when making the launch request.
For required inputs with no default value, that will result in a 400.

At the very least, if a boolean value is required and has no default, we should be translating an unchecked toggle to false to make sure the launch request succeeds.

Once default values are implemented for the form, this should become less of an issue.

Audit of UI / UX tests

We need a story around what types of testing we are doing for the UI, and an update of the existing test coverage to move toward that goal.
Right now, we have a mixture of tests implemented with react-testing-library, Enzyme(?), and react-test-renderer (mostly snapshots which we don't really need).

The target will be:

  • Use react-testing-library for all unit/component tests.
  • Remove Enzyme / react-test-renderer
  • Make a decision on whether we need any integration / end-2-end / automated UI testing (something like Cypress / Browserstack / etc.)
  • Choose a target for code coverage and open one or more issues to track hitting that target.

Node Validators

It should be possible to specify pre and post validators on nodes to prevent advancement of a node (or cache poisoning) if the input/output data does not match standards.

Handle edge cases around schedule updates

Background: We don't have any transactional guarantees for the case where a schedule rule in cloudwatch is say, deleted but the subsequent database update fails. Although we return an error and a user can retry (and the delete call to cloudwatch is idempotent) unless the user retries we have no guarantee of being in a non-corrupt state.

Β 

We could update the scheduled workflow event dequeuing logic to trigger a call to delete a rule when no active launch plan versions exist. Unfortunately there's a possible race condition this exposes in the case of an end-user calling disable in one step, and then enable separately after that.

Β 

As a solution, [~matthewsmith] proposed adding an epoch to schedule names to distinguish them. Since we already want to make schedule names more descriptive (with some kind of truncated project & domain in the name) that work can fall under this work item.

Add Auth to Console

Admin handles most of the auth flow. Console needs to properly handle 401 responses and redirect to the auth flow to refresh cookies.

Support additional input types in the Launch UI

We don't currently support list/map or some of the less common types. This task is to at least implement list/map and explore if there is anything we can do about supporting the other types.

Default timeout policy

Right now if a container is misconfigured or something, the job sticks around forever.Β  Propeller should garbage collect and fail.

Plugin Default Behavior Update

{"json":\{"exec_id":"","node":"","ns":"-development","routine":"worker-13","src":"handler.go:216","tasktype":"spark","wf":"***.SparkTasksWorkflow"}

,"level":"warning","msg":"No plugin found for Handler-type [spark], defaulting to [container]","ts":"2019-11-11T21:09:36Z"}

Defaulting Spark to container doesn't make sense and ideally we should fail cleanly at Propeller levelΒ  and expose it to users instead of executing it as a container task and leading to an unknown/weird container failures. I think this also applies to other tasks like Hive/Sidecar.Β 

HTTP 400 returned when attempting to retrieve data for NodeExecution child of a Dynamic Task

Update:

This is a UI bug. We should not attempt to retrieve inputs if no inputsUri is set, and should not attempt to retrieve outputs if closure.outputsUri is unset.Β 


Direct child

[https://flyte.lyft.net/api/v1/data/node_executions/flytekit/production/y9n8xi9amd/task1-b0e1be7f74-h-task-sqb5710215b84d56d6770b72f5e3cd4f797910c6e6-0-0]

Grandchild (nested subtask)

[https://flyte.lyft.net/api/v1/data/node_executions/flytekit/production/y9n8xi9amd/task1-b0e1be7f74-h-task-sqb5710215b84d56d6770b72f5e3cd4f797910c6e6-0-0-78d085b30a--sub-taskb5710215b84d56d6770b72f5e3cd4f797910c6e6-0-0]

The above URLs should both return NodeExecution data for the ids provided, but instead they return an error "invalid URI".

Β 

Move flytegraph into a separate package

The graph components in the console are designed to be a reusable package, but while it's under active development I'm leaving it inside the flyteconsole repo. This ticket is for tracking the work to be done to publish it as a standalone package.

Sorting/filtering by inputs

It's useful to filter executions down by the value of certain inputs. For instance, if a workflow takes a region code as an input and is run frequently with different values for the region code, a user may want to only see executions using one given value of that code ("SEA").

This functionality will require a design spec, since workflows may have many inputs of varying types and indexing across those types and values is non-trivial.

Note: There is an internal design document that could be cleaned up and moved to public in order to provide guidance for this item.

Graph Enhancements

This is to cover any overflow / nice-to-haves on the graph implementation after the initial usable version. Some ideas:

  • Diving into layers of the graph (i.e. expanding subworkflow nodes inline)
  • Zooming/panning
  • Hover animations, including highlighting data flow in adjacent nodes
  • Animations on nodes in progress
  • Different rendering for nodes which were not executed

Execution IDs aren't copy-pastable across UI, CLI

The full execution idea ID ex:project:domain:id

Β 

In the UI we only show the last portion ("id")

The CLI requires the full "ex:project:domain:id", meaning you can easily copy-paste between the two.Β 

Β 

Request from pricing.Β 

Parallel/Map Node

Allow loose parallelism as a native part of the Flyte spec.Β  In other words, allow a 'parallel node' to take a list of inputs and map the work out to replicas of the same executable: task, workflow, or launch plan.

Hotkeys

There are probably some hotkeys worth implementing. This is a placeholder to determine what those should be.

Render Logs directly in the UI

We have enough information from activity execution entity to make calls directly to AWS to retrieve log stream events.

Accessing log streams requires specific permissions. These won't exist on the client (nor should they). But the server side could be granted that role and be a proxy for the logs.

So it might look something like this:

  • Client makes a request to UI server side to open logs for a specific execution, passing the execution ID. This opens a long-lived TCP request which will be used to stream the log back to the client
  • Server-side opens a connection to AWS to get the log stream for that execution. These have to be retrieved in chunks. Server-side begins streaming the chunks to the client
  • Server-side listens for (pings? Can AWS do push for these?) additional log stream lines and pushes them to the client as they are discovered.

Questions/Concerns:

  • This could be simpler if there was a way for the UI to retrieve a temporary token to use for AWS access. Can the server generate one of these and return it?
  • How do we know when the log stream has ended and we can close the connection to the client? Can we check for a specific string in it?Β 
  • Each one of these will consume a connection to the server and hold it open for what could be a long time. This could cause resource constraints, but we can always scale the UI servers to accomodate
  • Should we consider web sockets for this type of thing? We could have a mechanism where, while an active websocket connection is open watching a particular execution, the server-side will continue to poll for the latest logs and deliver them to whatever listeners are active. This has the benefit of only making the requests to AWS once if there are multiple listeners
  • If we do use Websockets, this functionality is almost complicated enough to warrant spinning up a separate service to handle it.

Filter/view executions by SHA in Flyte 2.0 UI

Already in the cli:Β 

flyte-cli -h flyte.lyft.net -p flytekit -d development list-executions -f "eq(workflow.version,gitsha)"

Β 

This is to track potential for this in the UI.Β 

Customer notes:

Β 

NOTE

The UI can already filter executions by Version, but we don't show versions in the executions table. The work here is mostly for adding that.

Will require a small amount of UX work to determine how to surface versions in the table rows.

Β 

Β 

Switch flyteidl output to be commonjs

flyteidl is currently being output as an es6 module, which makes it incompatible with NodeJS unless it is run through webpack first. There's no real reason to do it that way, and protobufjs supports commonjs output, so we should switch to that.

Replace loading indicators

We want to make some updates to the way we load items:

  • Show no loading indicator if the request returns within 1 second
  • After 1 second, show a shimmer/skeleton state

TODO: Document all the places where we currently use loading spinners.

Better document the local testing story

The local testing story is weak... we can do a better job documenting tips for how to improve.

Our initial idea is that the pyflyte execute command can be run locally, but this has some problems like it uses an autodeleting temp dir and it might mess up real outputs in S3, etc.

We'll play around with stuff and at least come up with some short term workarounds.

Support specifying notifications when launching workflows via the UI

The Inputs for launching a workflow accept a Notifications fields, which can be used to specify notification rules for specific states. It's a little complicated (can be email, PD, Slack to multiple recipients for multiple states), so we'll tackle it as a separate task.

Rework dynamic node relationships in data model

Admin currently allows tasks to be parents of other nodes (1->many) and nodes to be parents of other tasks (1-1). This has lead to some confusion/assumptions:

  • While tasks do yield nodes, they, tasks, finish executing well before those nodes start. It's not entirely accurate to have this task->node parent relationship
  • Due to how they are currently presented in the data model, the nested UX looks confusing with the task row showing success and sub-rows showing running (indicating the yielded nodes are still running).

We have talked separately on different occasions about how this should ideally be represented. This task is to track the concrete steps towards a better model.

Figure out validation / default value implementation for JS

Problem:

The messages coming back from the API are decoded by protobufjs. But since all the fields in a proto messages are optional by convention, we don't have any assurance that the records are valid and usable. This has caused errors before on the client side.

Β 

Solution options:

  • Manual validation of the records and type-casting (message as X) or type-guarding (: message is X) to the stricter types present on the client side. This has the advantage of being flexible in the UI requirements, and the disadvantage of being difficult to keep in sync with the protobuf source of truth.
  • Automated validation via some type of schema definition stored on the client side (JSON Schema is one such option). This has the advantage of generating consistent code on the client side which is kept up-to-date automatically as the schema is updated, as well as providing a schema document that can be used to validate the JSON output from the API. It has the same disadvantage of being a separate solution which must be updated manually any time the API contract changes.
  • Switch the console to use protoc-generated JS/TS libraries and decorate all protobuf messages with the appropriate validation. This has the advantage of the validation rules being identical on both server and client (and updating automatically) as well as providing a generic solution for validation (call validate() on the message class coming back from the server). It has the disadvantage of requiring a non-trivial amount of work: Switching from protobufjs to protoc, enabling the TS output from protoc, updating console code to work with the new typings and decoding strategy.

Option 3 is ideal, but the amount of work necessary to do so is concerning (especially considering it may not work correctly and we might have to back it out).

Breadcrumbs for the UI

We need to determine what info should be available in the breadcrumbs.

  • Show project, domain, entity type, (sub-entity type),Β  version. In this case, sub-entity is something like an execution or launch plan belonging to a particular workflow.
  • Show a static project/domain combo just to set context, but don't make them links, then show the same as in #1
  • Leave out project/domain entirely

Update visuals used for errors

This is a task to audit our usage of error messages.

  • Ensure that all places where we use error messages are using an appropriately sized component
  • Evaluate messaging used
  • Discover any views/components which currently do not use error messages in their failure states and update them

Implement Launch Plan details

This will probably be similar to Workflow Version details, in that it will show information from the closure. But it may not show the graph, or it may optionally allow a user to show a graph view of the workflow at that version.

TODO: Determine which details of a LP are useful to show.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.