Comments (6)
π€
Hey @spacepirate0001, great to see you back! Ran into another quirky adventure, I see? Let's dive into it.
Thank you for providing detailed information about the issue you're facing. It seems like the inline
configuration override for task_resources
in the Flyte configuration is not reflecting the expected changes in the task container resources. This could be due to several reasons:
-
Configuration Not Applied: The changes made to the
task_resource_defaults.yaml
file might not have been applied to the Flyte cluster. Ensure that after modifying the configuration file, you apply the changes using the appropriate command, such askubectl apply -k kustomize/base/single_cluster/headless/
if you are using Kubernetes. -
Configuration File Location: The configuration file must be in the correct location and correctly referenced by the FlyteAdmin service. Ensure that the file path
kustomize/base/single_cluster/headless/config/admin/task_resource_defaults.yaml
is correctly mounted into the FlyteAdmin pod and that FlyteAdmin is configured to read from this file. -
FlyteAdmin Service Restart: After updating the configuration file, the FlyteAdmin service may need to be restarted for the changes to take effect. This is because configuration files are often read at the service startup, and changes made while the service is running are not automatically picked up.
-
Syntax or Indentation Error: YAML files are sensitive to syntax and indentation. An error or misalignment in the
task_resource_defaults.yaml
file could cause the configuration to be ignored or not parsed correctly. Ensure that the YAML syntax is correct and follows the required structure. -
Overrides at Task Registration: Task definitions can specify their own resource requirements, which override the defaults specified in
task_resource_defaults.yaml
. If the tasks have their resource requirements set at the time of registration, those values will take precedence over the defaults. -
Flyte Version Compatibility: Ensure that the version of Flyte you are using supports the configuration overrides as expected. It's possible that the feature behaves differently or has been updated in different versions of Flyte.
To diagnose and fix the issue, you should:
- Verify that the configuration changes have been applied correctly.
- Ensure that the FlyteAdmin service is correctly configured to use the updated configuration file and restart the service if necessary.
- Check the syntax and indentation of the
task_resource_defaults.yaml
file. - Review the task definitions to ensure they are not specifying their own resource requirements that override the defaults.
- Confirm that the Flyte version you are using supports the configuration overrides as expected.
I also found a similar solved issue [Docs] Document how to change default platform conf when setting resources for task which might be helpful.
If you're trying to override the resources allocated to tasks at runtime, you can use the with_overrides
method to dynamically override the resources allocated to tasks. This is demonstrated in the provided example where the with_overrides
method is used to change the resource limits of a task within a workflow. Here's how it's done:
- Define your task normally with initial resource requests and limits using the
@task
decorator and specifying therequests
andlimits
parameters. - In your workflow, when calling the task, use the
.with_overrides()
method on the task call to specify new resource limits. - Pass the
Resources
object with the desiredcpu
andmem
overrides to thelimits
parameter of thewith_overrides
method.
Here's the relevant code snippet from the context:
@workflow
def my_pipeline(x: typing.List[int]) -> int:
return square_1(x=count_unique_numbers_1(x=x)).with_overrides(limits=Resources(cpu="6", mem="500Mi"))
In this example, the with_overrides
method is used to override the resource limits of the square_1
task call within the my_pipeline
workflow, setting the CPU limit to "6" and the memory limit to "500Mi".
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
Sources
- [Docs] Document how to change default platform conf when setting resources for task
- kustomize/base/single_cluster/headless/config/admin/task_resource_defaults.yaml
- docs/user_guide/productionizing/customizing_task_resources.md
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a π if this is helpful and π if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
from flyte.
@spacepirate0001 , how did you update the values in the task resources?
from flyte.
@spacepirate0001 , how did you update the values in the task resources?
I first tried to add it to clusterResourceTemplates
as follows:
013_flyte_task_resource_defaults.yaml: |
task_resources:
defaults:
cpu: 2
memory: 3Gi
ephemeralStorage: 0
gpu: 0
limits:
cpu: 2
memory: 3Gi
ephemeralStorage: 0
gpu: 0
It did not work then I added it as configuration.inline
:
#inline Specify additional configuration or overrides for Flyte, to be merged with the base configuration
inline:
task_resources:
limits:
cpu: 2
memory: 3Gi
ephemeralStorage: 0
gpu: 0
defaults:
cpu: 2
memory: 3Gi
ephemeralStorage: 0
gpu: 0
This partially worked as new task_resource run with the following:
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
My tasks need more resources which end up with the error OOMKilled with exit code 137
from flyte.
@spacepirate0001 , how are you updating the values? Can you share the commands you used? Also, can you confirm which requests
and limits
values, if any, you were using in the tasks you tested on?
from flyte.
Code is deployed via terraform module updates and I can see the changes I make being reflected on the manifest. You should try the same in your setup and see that the values donβt change beyond what Iβve mentioned. Finally Iβm running flyte-binary chart in which I did not find values for task_resources at all.
from flyte.
In my flyte-binary
values I've set as suggested here https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/aws/05-deploy-with-helm.md#time-for-helm
configuration:
inline:
task_resources:
defaults:
cpu: 500m
memory: 500Mi
storage: 500Mi
limits:
cpu: "10"
memory: 20Gi
However when I try to register a workflow with tasks that has a task with a limit set to cpu=4
, I get the following error response:
USER:BadInputToAPI: error=None, cause=<_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Requested CPU limit [4] is greater than current limit set in the platform configuration [2]. Please contact Flyte Admins to change these limits or consult the configuration"
debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Requested CPU limit [4] is greater than current limit set in the platform configuration [2]. Please contact Flyte Admins to change these limits or consult the configuration", grpc_status:3, created_time:"2024-03-20T11:29:01.232266149+01:00"}"
If I run flytectl get task-resource-attribute -p flytesnacks -d development
, I get
{"project":"flytesnacks","domain":"development","defaults":{"cpu":"1","memory":"150Mi"},"limits":{"cpu":"2","memory":"2Gi"}}
as a response, which doesn't seem to match my values but rather the default values.
EDIT my issue was that I had previously set task-resource-attribute via pyflyte update task-resource-attribute
for the project/domain. Deleting that allowed flyte to pick up the default task resouces.
from flyte.
Related Issues (20)
- [Core feature] Flytekit should support using output with `Non-Any` type as the input with `Any` type. HOT 5
- [BUG] Retriability of timeouts appears inconsistent HOT 2
- [BUG] `PanderaTransformer::to_python_value()` seems to be returning an incorrect type HOT 2
- [BUG] flytectl upgrade is broken after moving to the monorepo HOT 2
- [BUG] Pin fsspec<2024.5.0 HOT 2
- [BUG] Namespace creation fails with default pod template HOT 5
- [BUG] flytectl demo start fails with "Error: malformed version" HOT 2
- [Docs] Clarify PodTemplate restrictions and behavior HOT 2
- [Docs] Prevent using mutable default arguments in flytesnacks HOT 1
- [Core feature] Replace `os.path` with `pathlib` for flytekit HOT 1
- Obfuscate sensitive data in TaskConfig HOT 4
- [BUG] Fix non thread safe token cache behavior HOT 1
- [Core feature] Flyteadmin SMPT email publisher HOT 1
- [BUG] rshift '>>' operator doesn't work properly with remoteEntities HOT 2
- [Core feature] Allow flytectl to set a targetExecutionCluster HOT 1
- [BUG] Boolean values within pydantic base model being casted to scalar value HOT 1
- [Housekeeping] Files used in `data_types_and_io.normalize_csv_file` and `data_types_and_io.download_and_normalize_csv_files` are no longer accessible HOT 6
- [Core feature] Default task resource behavior should apply for node level overrides HOT 3
- [Core feature] Update/register multiple launch plans with different inputs HOT 1
- [BUG] (Kubeflow) PyTorchPlugin sets Replicas to 0 casuing infinite loop HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flyte.