Git Product home page Git Product logo

Comments (13)

dosubot avatar dosubot commented on May 28, 2024 2

🤖

Based on the changes made between Flyte 1.10.0 and Flyte 1.10.7, it seems like there has been a shift in how ephemeral storage is handled. Specifically, the term Storage was replaced with Ephemeral Storage in the Helm Chart as per this pull request. Additionally, the option to use storage as a task resource was removed as per this pull request.

These changes could potentially be the reason why your ephemeral storage request is not being respected. However, without more information, it's hard to say for certain.

To help troubleshoot this issue, could you please provide the following information:

  1. The version of the Flytekit you're using.
  2. The configuration of your Flyte deployment, specifically the settings related to resource allocation and limits.
  3. The logs from the Flyte Propeller, which is the component responsible for translating Flyte tasks into Kubernetes pods.

This information will help us understand the context in which the task is being executed and why the ephemeral storage request might not be respected.

In the meantime, you might want to consider downgrading to Flyte 1.10.0 if that version was working as expected for your use case. Please note that this is just a workaround and not a long-term solution. We'll continue to investigate this issue to find a proper fix.

For more information about resource configuration in Flyte, you can refer to the Flyte documentation.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

from flyte.

robert-ulbrich-mercedes-benz avatar robert-ulbrich-mercedes-benz commented on May 28, 2024 1

Hi @Future-Outlier,

okay, I will then request help in the Slack channel then

from flyte.

welcome avatar welcome commented on May 28, 2024

Thank you for opening your first issue here! 🛠

from flyte.

Future-Outlier avatar Future-Outlier commented on May 28, 2024

It's fixed here.
#4929
Thank you.

from flyte.

robert-ulbrich-mercedes-benz avatar robert-ulbrich-mercedes-benz commented on May 28, 2024

Hi @Future-Outlier,

thanks for your quick response. I am addressing my response to the questions asked by the bot to you.

  1. We were using Flyte 1.10.7
  2. I am not allowed to send the complete configuration, but I can tell, we have no resource defaults set on ephemeral storage

{ "insertId": "1qgk89aismv5o2by", "jsonPayload": { "json": { "node": "n0", "res_ver": "254879967", "tasktype": "helix-task", "src": "container_helper.go:329", "wf": "ndas:dev:msp_export_workflow.msp_export_specific_session_workflow", "routine": "worker-24", "exec_id": "fda883097f3dd44f992e", "ns": "ndas-dev" }, "msg": "Adjusted container resources [{map[cpu:{{2 0} {<nil>} 2 DecimalSI} ephemeral-storage:{{20971520 0} {<nil>} 20Mi BinarySI} memory:{{2 9} {<nil>} 2G DecimalSI}] map[cpu:{{2 0} {<nil>} 2 DecimalSI} ephemeral-storage:{{20971520 0} {<nil>} 20Mi BinarySI} memory:{{2 9} {<nil>} 2G DecimalSI}] []}]", "level": "info", "ts": "2024-02-27T13:59:37Z" }, "resource": { "type": "k8s_container", "labels": { "container_name": "flytepropeller", "project_id": "mb-adas-workflow-p-56e1", "pod_name": "flytepropeller-67ff8fdc9f-nbtbs", "cluster_name": "mbadas-prod-gke-workflow", "location": "europe-west4", "namespace_name": "flyte" } }, "timestamp": "2024-02-27T13:59:37.057502579Z", "severity": "INFO", "labels": { "k8s-pod/app_kubernetes_io/managed-by": "Helm", "k8s-pod/app_kubernetes_io/name": "flytepropeller", "k8s-pod/pod-template-hash": "67ff8fdc9f", "k8s-pod/helm_sh/chart": "flyte-core-v1.10.7", "k8s-pod/app_kubernetes_io/instance": "flyte-core", "compute.googleapis.com/resource_name": "gke-mbadas-prod-gke--mbadas-prod-gke--d5166da6-59c5" }, "logName": "projects/mb-adas-workflow-p-56e1/logs/stderr", "receiveTimestamp": "2024-02-27T13:59:39.647297684Z" }

{ "insertId": "a56xdll71u0g5c71", "jsonPayload": { "msg": "The resource requirement for creating Pod [ndas-dev/fda883097f3dd44f992e-n0-6] is [[{[memory]: [2G]} {[cpu]: [2]} {[ephemeral-storage]: [20Mi]}]]\n", "level": "info", "json": { "tasktype": "helix-task", "routine": "worker-24", "res_ver": "254879967", "src": "plugin_manager.go:179", "node": "n0", "exec_id": "fda883097f3dd44f992e", "ns": "ndas-dev", "wf": "ndas:dev:msp_export_workflow.msp_export_specific_session_workflow" }, "ts": "2024-02-27T13:59:37Z" }, "resource": { "type": "k8s_container", "labels": { "location": "europe-west4", "cluster_name": "mbadas-prod-gke-workflow", "pod_name": "flytepropeller-67ff8fdc9f-nbtbs", "namespace_name": "flyte", "project_id": "mb-adas-workflow-p-56e1", "container_name": "flytepropeller" } }, "timestamp": "2024-02-27T13:59:37.058019995Z", "severity": "INFO", "labels": { "k8s-pod/app_kubernetes_io/instance": "flyte-core", "k8s-pod/helm_sh/chart": "flyte-core-v1.10.7", "compute.googleapis.com/resource_name": "gke-mbadas-prod-gke--mbadas-prod-gke--d5166da6-59c5", "k8s-pod/app_kubernetes_io/name": "flytepropeller", "k8s-pod/pod-template-hash": "67ff8fdc9f", "k8s-pod/app_kubernetes_io/managed-by": "Helm" }, "logName": "projects/mb-adas-workflow-p-56e1/logs/stderr", "receiveTimestamp": "2024-02-27T13:59:39.647297684Z" }

We already downgraded and since then everything works fine again. But is is sad, that we cannot update to a more recent version.

Looking at the piece of code, from which the logs are generated, I wonder why the resource requests from our task are overwritten at all. That should not happen and I am concerned that it will still happen even with #4929

from flyte.

Future-Outlier avatar Future-Outlier commented on May 28, 2024

If you set the configuration with ephemeral storage no limit, did you restart the flyte cluster deployment?

from flyte.

robert-ulbrich-mercedes-benz avatar robert-ulbrich-mercedes-benz commented on May 28, 2024

@Future-Outlier, I maybe do not understand your last statement correctly, but what we did was to limit the ephemeral storage use in a task decorator to 100 GB, then starting the workflow with pyflyte. In the end the limit of the Kubernetes container was set to only 20 MiB.

It is not an option for us to fully lift the limit on the ephemeral storage.

from flyte.

Future-Outlier avatar Future-Outlier commented on May 28, 2024

@robert-ulbrich-mercedes-benz, I mean that did you

  1. edit the task resource field in the configmap
  2. restart you cluster with command like
    kubectl rollout restart deployment flyte-sandbox -n flyte?

from flyte.

robert-ulbrich-mercedes-benz avatar robert-ulbrich-mercedes-benz commented on May 28, 2024

Hi @Future-Outlier ,

it is hard for me to understand what your actual point is. At least for me it would make things easier if you could please provide a few more sentences about your intentions.

We deployed Flyte using the Flyte Core Helm Chart, so we do not have a flyte-sandbox deployment that I could restart. Which exact config map are your referring to? There are quite a lot of them for the Flyte Core helm deployment. I could find the flyte-admin-base-config which has a key ``task_resource_defaults.yaml`:

task_resource_defaults.yaml:
----
task_resources:
  defaults:
    cpu: 500m
    ephemeralStorage: 500Mi
    memory: 500Mi
    storage: 500Mi
  limits:
    cpu: 128
    ephemeralStorage: 20Mi
    gpu: 16
    memory: 2000Gi
    storage: 10000Gi

This is the config on Flyte v1.10.7 in our dev environment. The question again is: Why is our config provided in the task overwritten by the defaults? It should rather be the other way around in my eyes

from flyte.

Future-Outlier avatar Future-Outlier commented on May 28, 2024

Hi, @robert-ulbrich-mercedes-benz sorry for the misunderstanding.
I didn't recognize that you are Flyte Core helm deployment.
I haven't used this before, so can't respond to you immediately.
Can you come to the Slack channel and discuss this with us?
Someone know how to fix it, thank you

from flyte.

eapolinario avatar eapolinario commented on May 28, 2024

@robert-ulbrich-mercedes-benz , I was not able to repro the issue using the example in the description, only if I used pod templates. For example, let's say a task f is defined like so:

@task(
    task_config=Pod(
        pod_spec=V1PodSpec(
            containers=[
                V1Container(
                    name="primary",
                    resources=V1ResourceRequirements(
                        requests={"ephemeral-storage": "1Gi"},
                        limits={"ephemeral-storage": "1Gi"},
                    ),
                ),
            ],
        ),
    ),
)
def f():
    pass

If the configured default ephemeral storage is set to any value, then that's what's used (since that's what passes the validation).

from flyte.

robert-ulbrich-mercedes-benz avatar robert-ulbrich-mercedes-benz commented on May 28, 2024

Hi @eapolinario,

we also have default pod templates configured for our workflows. But those pod templates do not configure default ephemeral storage. But as mentioned in this ticket, there are task resource defaults.

As mentioned we can easily reproduce the issue.

Best regards

Rob Ulbrich

from flyte.

eapolinario avatar eapolinario commented on May 28, 2024

@robert-ulbrich-mercedes-benz , just to be clear, we have an outstanding bug in Flyte that essentially does not validate ephemeral storage values in pod templates. The moment we added a default value for ephemeral storage as a task resource default, that value was the one used by the task resource validation, regardless of the original value defined in the pod template.

Just to be crystal clear, let's say we have this task:

@task(
    task_config=Pod(
        pod_spec=V1PodSpec(
            containers=[
                V1Container(
                    name="primary",
                    resources=V1ResourceRequirements(
                        requests={"ephemeral-storage": "1Gi"},
                        limits={"ephemeral-storage": "1Gi"},
                    ),
                ),
            ],
        ),
    ),
)
def f():
    ...

This is the values as shown in the registered task template:
Screenshot 2024-03-08 at 10 23 55 AM

And task resource defaults are defined as such:

task_resources:
  defaults:
    ephemeralStorage: 500Mi
  limits:
    ephemeralStorage: 20Mi

Notice that we're just using the (non-sensical) values defined in 1.10.7.

Upon running that task f we see the following resources in the pod:

resources:
  limits:
    cpu: "2"
    ephemeral-storage: 20Mi
    memory: 200Mi
  requests:
    cpu: "2"
    ephemeral-storage: 20Mi
    memory: 200Mi

The values for cpu and memory are coming from the defaults.

This bug is being fixed by #5019. After that's released we'll see an error during registration if the values defined in the pod template do not pass task validation. FYI: we are planning a release, 1.11.0, for next week.

from flyte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.