Git Product home page Git Product logo

Comments (6)

dosubot avatar dosubot commented on May 28, 2024 2

πŸ€–

Hey @spacepirate0001, great to see you back! Ran into another quirky adventure, I see? Let's dive into it.

Thank you for providing detailed information about the issue you're facing. It seems like the inline configuration override for task_resources in the Flyte configuration is not reflecting the expected changes in the task container resources. This could be due to several reasons:

  1. Configuration Not Applied: The changes made to the task_resource_defaults.yaml file might not have been applied to the Flyte cluster. Ensure that after modifying the configuration file, you apply the changes using the appropriate command, such as kubectl apply -k kustomize/base/single_cluster/headless/ if you are using Kubernetes.

  2. Configuration File Location: The configuration file must be in the correct location and correctly referenced by the FlyteAdmin service. Ensure that the file path kustomize/base/single_cluster/headless/config/admin/task_resource_defaults.yaml is correctly mounted into the FlyteAdmin pod and that FlyteAdmin is configured to read from this file.

  3. FlyteAdmin Service Restart: After updating the configuration file, the FlyteAdmin service may need to be restarted for the changes to take effect. This is because configuration files are often read at the service startup, and changes made while the service is running are not automatically picked up.

  4. Syntax or Indentation Error: YAML files are sensitive to syntax and indentation. An error or misalignment in the task_resource_defaults.yaml file could cause the configuration to be ignored or not parsed correctly. Ensure that the YAML syntax is correct and follows the required structure.

  5. Overrides at Task Registration: Task definitions can specify their own resource requirements, which override the defaults specified in task_resource_defaults.yaml. If the tasks have their resource requirements set at the time of registration, those values will take precedence over the defaults.

  6. Flyte Version Compatibility: Ensure that the version of Flyte you are using supports the configuration overrides as expected. It's possible that the feature behaves differently or has been updated in different versions of Flyte.

To diagnose and fix the issue, you should:

  • Verify that the configuration changes have been applied correctly.
  • Ensure that the FlyteAdmin service is correctly configured to use the updated configuration file and restart the service if necessary.
  • Check the syntax and indentation of the task_resource_defaults.yaml file.
  • Review the task definitions to ensure they are not specifying their own resource requirements that override the defaults.
  • Confirm that the Flyte version you are using supports the configuration overrides as expected.

I also found a similar solved issue [Docs] Document how to change default platform conf when setting resources for task which might be helpful.

If you're trying to override the resources allocated to tasks at runtime, you can use the with_overrides method to dynamically override the resources allocated to tasks. This is demonstrated in the provided example where the with_overrides method is used to change the resource limits of a task within a workflow. Here's how it's done:

  1. Define your task normally with initial resource requests and limits using the @task decorator and specifying the requests and limits parameters.
  2. In your workflow, when calling the task, use the .with_overrides() method on the task call to specify new resource limits.
  3. Pass the Resources object with the desired cpu and mem overrides to the limits parameter of the with_overrides method.

Here's the relevant code snippet from the context:

@workflow
def my_pipeline(x: typing.List[int]) -> int:
    return square_1(x=count_unique_numbers_1(x=x)).with_overrides(limits=Resources(cpu="6", mem="500Mi"))

In this example, the with_overrides method is used to override the resource limits of the square_1 task call within the my_pipeline workflow, setting the CPU limit to "6" and the memory limit to "500Mi".

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

from flyte.

eapolinario avatar eapolinario commented on May 28, 2024

@spacepirate0001 , how did you update the values in the task resources?

from flyte.

spacepirate0001 avatar spacepirate0001 commented on May 28, 2024

@spacepirate0001 , how did you update the values in the task resources?

I first tried to add it to clusterResourceTemplates as follows:

    013_flyte_task_resource_defaults.yaml: |
      task_resources:
        defaults:
          cpu: 2
          memory: 3Gi
          ephemeralStorage: 0
          gpu: 0
        limits:
          cpu: 2
          memory: 3Gi
          ephemeralStorage: 0
          gpu: 0  

It did not work then I added it as configuration.inline:

#inline Specify additional configuration or overrides for Flyte, to be merged with the base configuration
  inline:  
    task_resources:
      limits:
        cpu: 2
        memory: 3Gi
        ephemeralStorage: 0
        gpu: 0
      defaults:
        cpu: 2
        memory: 3Gi
        ephemeralStorage: 0
        gpu: 0

This partially worked as new task_resource run with the following:

resources:
      limits:
        cpu: "1"
        memory: 2Gi
      requests:
        cpu: "1"
        memory: 2Gi

My tasks need more resources which end up with the error OOMKilled with exit code 137

from flyte.

eapolinario avatar eapolinario commented on May 28, 2024

@spacepirate0001 , how are you updating the values? Can you share the commands you used? Also, can you confirm which requests and limits values, if any, you were using in the tasks you tested on?

from flyte.

spacepirate0001 avatar spacepirate0001 commented on May 28, 2024

Code is deployed via terraform module updates and I can see the changes I make being reflected on the manifest. You should try the same in your setup and see that the values don’t change beyond what I’ve mentioned. Finally I’m running flyte-binary chart in which I did not find values for task_resources at all.

from flyte.

cjidboon94 avatar cjidboon94 commented on May 28, 2024

In my flyte-binary values I've set as suggested here https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/aws/05-deploy-with-helm.md#time-for-helm

configuration:
  inline: 
    task_resources:
     defaults:
        cpu: 500m
        memory: 500Mi
        storage: 500Mi
      limits:
        cpu: "10"
        memory: 20Gi

However when I try to register a workflow with tasks that has a task with a limit set to cpu=4, I get the following error response:

USER:BadInputToAPI: error=None, cause=<_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "Requested CPU limit [4] is greater than current limit set in the platform configuration [2]. Please contact Flyte Admins to change these limits or consult the configuration"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Requested CPU limit [4] is greater than current limit set in the platform configuration [2]. Please contact Flyte Admins to change these limits or consult the configuration", grpc_status:3, created_time:"2024-03-20T11:29:01.232266149+01:00"}"

If I run flytectl get task-resource-attribute -p flytesnacks -d development, I get
{"project":"flytesnacks","domain":"development","defaults":{"cpu":"1","memory":"150Mi"},"limits":{"cpu":"2","memory":"2Gi"}} as a response, which doesn't seem to match my values but rather the default values.


EDIT my issue was that I had previously set task-resource-attribute via pyflyte update task-resource-attribute for the project/domain. Deleting that allowed flyte to pick up the default task resouces.

from flyte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.