conductor-sdk / conductor-python Goto Github PK

View Code? Open in Web Editor NEW

50.0 5.0 25.0 1.31 MB

Conductor OSS SDK for Python programming language

License: Apache License 2.0

Python 99.91% C++ 0.01% Dockerfile 0.09%

python conductor workflow data-pipelines durable-computing durable-execution etl-pipeline

conductor-python's People

Contributors

Stargazers

Watchers

conductor-python's Issues

Add CI/CD step to publish new version at pypi when merged at main branch

Invalid value for `name`

Failed to poll task for: xxx,
conductor/client/http/models/task_def.py", line 265, in name
raise ValueError("Invalid value for name, must not be None") # noqa: E501
ValueError: Invalid value for name, must not be None

Publish Python SDK benchmark results as a blog post

Add unit tests implementation for general task/worker use case

Set up conductor-python releases to be published with Orkes Account

Resources:

Remove Python2 support and related dependencies

Start by removing six package dependency and its usage.

Research an elegant way to split logging level between `urllib3.connectionpool` and the rest

When using logging.level = logging.DEBUG, all requests are logged as raw, which is great, but gets annoying quite easily. Would be nice to have another level between them, to debug only the package code.

Update http client to be consistent with Orkes Playground

Use Orkes Playground Swagger docs as reference: https://play-app.orkes.io/api-docs

Steps:

Generate code using latest version of Swagger Code Generator
Replace current http client with new code and fix diffs

Undefined reference for WORKFLOW_START_ERROR at metric documentation

conductor-python/src/conductor/client/telemetry/metrics_collector.py

Line 147 in b9542e6

documentation=MetricDocumentation.WORKFLOW_START_ERROR,

Add `gzip` encoding for HTTP requests

Accept-Encoding: gzip

Setup GitHub Action to publish pypi releases

setup a github action that listens to release events (e.g. https://github.com/Netflix/conductor/blob/main/.github/workflows/publish.yml)
Update the setup.cfg to use an env variable for version
extract the tag that is checked out - the tag is created as part of GitHub release process
set the value of the tag (.e.g. v1.0.1) to extract version (e.g. 1.0.1) and set as env variable
Use env variables to pass user/pass for pypi
Configure the GitHub project with pypi user/pass as secrets that you can use in 5 above

Improve `ParallelTaskHandler` to share `TaskClient` between `TaskRunner`s

Currently, TaskHandler spawn a bunch of processes, capable of running tasks in parallel.

Each of these TaskRunners instantiate an ApiClient at each request, due to pickle issues.

Research required to understand alternatives.

Implement CLI shortcuts using Python SDK

Requires usage suggestion

Implement multi workers in parallel

Create workflow examples

Examples should be added here: https://github.com/conductor-sdk/conductor-examples

Add license header to the file

Evaluate client performance

Create a benchmark comparing the performance while changing the number of workers. Increasing this number should improve the performance, until reaching an inversion point, which will at least plateau the graph. Like this example:

Create integration tests for general use case

Aim to test all possible ways of application startup, with and without some parameters.

Tool used to test by hand:

from conductor.client.automator.task_handler import TaskHandler
from conductor.client.configuration.configuration import Configuration
from conductor.client.configuration.settings.authentication_settings import AuthenticationSettings
from conductor.client.configuration.settings.metrics_settings import MetricsSettings
from conductor.client.http.api_client import ApiClient
from conductor.client.http.api.metadata_resource_api import MetadataResourceApi
from conductor.client.http.api.task_resource_api import TaskResourceApi
from conductor.client.http.api.workflow_resource_api import WorkflowResourceApi
from conductor.client.http.models import Task, TaskResult
from conductor.client.http.models.task_result_status import TaskResultStatus
from conductor.client.worker.worker_interface import WorkerInterface
from typing import List
import logging

logger = logging.getLogger(
    Configuration.get_logging_formatted_name(
        __name__
    )
)


class SimplePythonWorker(WorkerInterface):
    def execute(self, task: Task) -> TaskResult:
        task_result = self.get_task_result_from_task(task)
        task_result.add_output_data('key1', 'value')
        task_result.add_output_data('key2', 42)
        task_result.add_output_data('key3', False)
        task_result.status = TaskResultStatus.COMPLETED
        return task_result


def get_python_task_definition_example() -> List[dict]:
    return [
        {
            "createTime": 1650595379661,
            "createdBy": "",
            "name": "python_task_example_from_code",
            "description": "Python task example from code",
            "retryCount": 3,
            "timeoutSeconds": 300,
            "inputKeys": [],
            "outputKeys": [],
            "timeoutPolicy": "TIME_OUT_WF",
            "retryLogic": "FIXED",
            "retryDelaySeconds": 10,
            "responseTimeoutSeconds": 180,
            "inputTemplate": {},
            "rateLimitPerFrequency": 0,
            "rateLimitFrequencyInSeconds": 1,
            "ownerEmail": "[email protected]",
            "backoffScaleFactor": 1
        },
    ]


def get_python_workflow_definition_example() -> dict:
    return {
        "updateTime": 1650595431465,
        "name": "workflow_with_python_task_example_from_code",
        "description": "Workflow with python task example from code",
        "version": 1,
        "tasks": [
            {
                "name": "python_task_example_from_code",
                "taskReferenceName": "python_task_example_from_code_ref_0",
                "inputParameters": {

                },
                "type": "SIMPLE",
                "decisionCases": {

                },
                "defaultCase": [

                ],
                "forkTasks":[

                ],
                "startDelay":0,
                "joinOn":[

                ],
                "optional":False,
                "defaultExclusiveJoinTask":[

                ],
                "asyncComplete":False,
                "loopOver":[

                ]
            }
        ],
        "inputParameters": [

        ],
        "outputParameters": {
            "workerOutput": "${python_task_example_from_code_ref_0.output}"
        },
        "schemaVersion": 2,
        "restartable": True,
        "workflowStatusListenerEnabled": False,
        "ownerEmail": "[email protected]",
        "timeoutPolicy": "ALERT_ONLY",
        "timeoutSeconds": 0,
        "variables": {

        },
        "inputTemplate": {

        }
    }


def define_task_and_workflow(api_client: ApiClient) -> None:
    metadata_client = MetadataResourceApi(api_client)
    try:
        metadata_client.register_task_def1(
            body=get_python_task_definition_example()
        )
        metadata_client.create(
            body=get_python_workflow_definition_example()
        )
    except Exception as e:
        logger.debug(f'Failed to define task/workflow, reason: {e}')


def start_workflow(api_client: ApiClient, workflow_name: str) -> str:
    workflow_client = WorkflowResourceApi(api_client)
    workflowId = workflow_client.start_workflow(
        body={},
        name=workflow_name
    )
    return workflowId


def start_workflows(api_client: ApiClient, workflow_name: str, qty: int) -> List[str]:
    workflowIdList = []
    for _ in range(qty):
        try:
            workflowId = start_workflow(api_client, workflow_name)
            workflowIdList.append(workflowId)
            logger.debug(
                f'Started workflow: {workflow_name}, with id: {workflowId}'
            )
        except Exception as e:
            logger.debug(
                f'Failed to start workflow: {workflow_name}, reason: {e}'
            )
    return workflowIdList


def main():
    configuration = Configuration(
        base_url='https://play.orkes.io',
        debug=True,
        authentication_settings=AuthenticationSettings(
            key_id='',
            key_secret=''
        )
    )
    configuration.apply_logging_config()

    api_client = ApiClient(configuration)

    workflow_id = start_workflow(
        api_client,
        'workflow_with_python_task_example_from_code'
    )
    logger.debug(f'workflow_id: {workflow_id}')

    task_api = TaskResourceApi(api_client)
    response = task_api.update_task_by_ref_name(
        output={'hello': 'world'},
        workflow_id=workflow_id,
        task_ref_name='python_task_example_from_code_ref_0',
        status=TaskResultStatus.COMPLETED.value,
    )
    logger.debug(f'task update response: {response}')

    workers = [
        SimplePythonWorker('python_task_example_from_code'),
    ]
    workflow_ids = start_workflows(
        api_client,
        'workflow_with_python_task_example_from_code',
        10
    )
    metrics_settings = MetricsSettings()
    with TaskHandler(workers, configuration, metrics_settings) as task_handler:
        task_handler.start_processes()
        task_handler.join_processes()


if __name__ == '__main__':
    main()

Add different logger per worker, while maintaining same configuration

Each worker will log as ${hostname}-${pid}

Add validation step for GitHub Action after publish release event

Post a community blog presenting Condutor and new Python SDK

Create a technical article for Python SDK. Topics to cover:

How to use it?
Why use it?
New features forecast

Add support for workflow creation as a code, instead of json from API

Add authentication token expiration/refresh

Evaluate behavior on invalid credentials:

Should workers keep polling with invalid token?

Add support for External Payload Upload

Similar to code implemented for Conductor Java SDK

Refactor documentation

Create new GitHub Action event on pull request

Steps to add:
- Check code against linter
- Run unit tests

I'm trying to pip install conductor-python and then import conductor, but I'm getting an No module named 'conductor'. Reverting to version 1.0.29 fixes it.

Improve documentation

Host settings
Change the instructions on how to run Conductor to steps here:
- https://orkes.io/content/docs/getting-started/install/running-locally#download-and-run
Change the task and workflow creation steps to point to UI instead of curl command
Starting a workflow - same.
Give the json for a simple workflow with a python worker that you show how to implement
In the worker example, can you include the code on how to point it to a remote server other than localhost?

Implement metrics collector

Reference

Similar to this Package inside the java client: https://github.com/gardusig/conductor/tree/main/client/src/main/java/com/netflix/conductor/client/telemetry

Updates should probably lie within api_client.py and rest.py at http folder: src/conductor/client/http

Refactor Swagger annotations

example does not work responding {"message":"Token cannot be null or empty","error":"INVALID_TOKEN","timestamp":1656058235313}

as the title suggests,
the simple_woker.py example in the README.md does not work

Add authentication layer on api client

Onboard Python Conductor SDK to Orkes Playground

Orkes Playground:

UI: https://play.orkes.io/
API: https://play.orkes.io/api/

Onboard repo to PyPI (Python Package Index)

Use name: conductor-python

Reference

Official documentation: https://packaging.python.org/en/latest/tutorials/packaging-projects/

Refactor unit tests, to be compliant with recent client changes

Aim to test all possible ways of application startup, with and without some parameters.

Tool used to test by hand:

from conductor.client.automator.task_handler import TaskHandler
from conductor.client.configuration.configuration import Configuration
from conductor.client.configuration.settings.authentication_settings import AuthenticationSettings
from conductor.client.configuration.settings.metrics_settings import MetricsSettings
from conductor.client.http.api_client import ApiClient
from conductor.client.http.api.metadata_resource_api import MetadataResourceApi
from conductor.client.http.api.task_resource_api import TaskResourceApi
from conductor.client.http.api.workflow_resource_api import WorkflowResourceApi
from conductor.client.http.models import Task, TaskResult
from conductor.client.http.models.task_result_status import TaskResultStatus
from conductor.client.worker.worker_interface import WorkerInterface
from typing import List
import logging

logger = logging.getLogger(
    Configuration.get_logging_formatted_name(
        __name__
    )
)


class SimplePythonWorker(WorkerInterface):
    def execute(self, task: Task) -> TaskResult:
        task_result = self.get_task_result_from_task(task)
        task_result.add_output_data('key1', 'value')
        task_result.add_output_data('key2', 42)
        task_result.add_output_data('key3', False)
        task_result.status = TaskResultStatus.COMPLETED
        return task_result


def get_python_task_definition_example() -> List[dict]:
    return [
        {
            "createTime": 1650595379661,
            "createdBy": "",
            "name": "python_task_example_from_code",
            "description": "Python task example from code",
            "retryCount": 3,
            "timeoutSeconds": 300,
            "inputKeys": [],
            "outputKeys": [],
            "timeoutPolicy": "TIME_OUT_WF",
            "retryLogic": "FIXED",
            "retryDelaySeconds": 10,
            "responseTimeoutSeconds": 180,
            "inputTemplate": {},
            "rateLimitPerFrequency": 0,
            "rateLimitFrequencyInSeconds": 1,
            "ownerEmail": "[email protected]",
            "backoffScaleFactor": 1
        },
    ]


def get_python_workflow_definition_example() -> dict:
    return {
        "updateTime": 1650595431465,
        "name": "workflow_with_python_task_example_from_code",
        "description": "Workflow with python task example from code",
        "version": 1,
        "tasks": [
            {
                "name": "python_task_example_from_code",
                "taskReferenceName": "python_task_example_from_code_ref_0",
                "inputParameters": {

                },
                "type": "SIMPLE",
                "decisionCases": {

                },
                "defaultCase": [

                ],
                "forkTasks":[

                ],
                "startDelay":0,
                "joinOn":[

                ],
                "optional":False,
                "defaultExclusiveJoinTask":[

                ],
                "asyncComplete":False,
                "loopOver":[

                ]
            }
        ],
        "inputParameters": [

        ],
        "outputParameters": {
            "workerOutput": "${python_task_example_from_code_ref_0.output}"
        },
        "schemaVersion": 2,
        "restartable": True,
        "workflowStatusListenerEnabled": False,
        "ownerEmail": "[email protected]",
        "timeoutPolicy": "ALERT_ONLY",
        "timeoutSeconds": 0,
        "variables": {

        },
        "inputTemplate": {

        }
    }


def define_task_and_workflow(api_client: ApiClient) -> None:
    metadata_client = MetadataResourceApi(api_client)
    try:
        metadata_client.register_task_def1(
            body=get_python_task_definition_example()
        )
        metadata_client.create(
            body=get_python_workflow_definition_example()
        )
    except Exception as e:
        logger.debug(f'Failed to define task/workflow, reason: {e}')


def start_workflow(api_client: ApiClient, workflow_name: str) -> str:
    workflow_client = WorkflowResourceApi(api_client)
    workflowId = workflow_client.start_workflow(
        body={},
        name=workflow_name
    )
    return workflowId


def start_workflows(api_client: ApiClient, workflow_name: str, qty: int) -> List[str]:
    workflowIdList = []
    for _ in range(qty):
        try:
            workflowId = start_workflow(api_client, workflow_name)
            workflowIdList.append(workflowId)
            logger.debug(
                f'Started workflow: {workflow_name}, with id: {workflowId}'
            )
        except Exception as e:
            logger.debug(
                f'Failed to start workflow: {workflow_name}, reason: {e}'
            )
    return workflowIdList


def main():
    configuration = Configuration(
        base_url='https://play.orkes.io',
        debug=True,
        authentication_settings=AuthenticationSettings(
            key_id='',
            key_secret=''
        )
    )
    configuration.apply_logging_config()

    api_client = ApiClient(configuration)

    workflow_id = start_workflow(
        api_client,
        'workflow_with_python_task_example_from_code'
    )
    logger.debug(f'workflow_id: {workflow_id}')

    task_api = TaskResourceApi(api_client)
    response = task_api.update_task_by_ref_name(
        output={'hello': 'world'},
        workflow_id=workflow_id,
        task_ref_name='python_task_example_from_code_ref_0',
        status=TaskResultStatus.COMPLETED.value,
    )
    logger.debug(f'task update response: {response}')

    workers = [
        SimplePythonWorker('python_task_example_from_code'),
    ]
    workflow_ids = start_workflows(
        api_client,
        'workflow_with_python_task_example_from_code',
        10
    )
    metrics_settings = MetricsSettings()
    with TaskHandler(workers, configuration, metrics_settings) as task_handler:
        task_handler.start_processes()
        task_handler.join_processes()


if __name__ == '__main__':
    main()

Latest Conductor Docker image built within GitHub
Client Docker image with integration tests

Add podcast/youtube video forecast

https://www.notion.so/5331e1d0eba641669bfcc934d567dd1e?v=b9c1da4cd6bd42a1b67b426917f22792

Add better method name convention for Resource API calls

Example:

Metadata resource API:
- register_task_def1 could be register_task_def
- create could be register_workflow_def
- Resource: https://github.com/conductor-sdk/conductor-python/blob/main/src/conductor/client/http/api/metadata_resource_api.py#L569

Improve TaskHandler polling strategy

Add new parameter for batchSize
Each worker will have a standalone subprocess to poll indefinitely
- For each polled task, start a new subprocess to execute and update it

Refactor `PollingInterval`, removing from `TaskRunner` to `WorkerInterface`, requiring its implementation on workers

Reference: https://github.com/Netflix/conductor/blob/main/client/src/main/java/com/netflix/conductor/client/worker/Worker.java#L85-L92

conductor-sdk / conductor-python Goto Github PK

conductor-python's People

Contributors

Stargazers

Watchers

Forkers

conductor-python's Issues

Reference

Reference

Recommend Projects

Recommend Topics

Recommend Org