Git Product home page Git Product logo

Comments (6)

Ark-kun avatar Ark-kun commented on May 6, 2024 1

It might be very useful to be able to query executions or artifacts based on values of some property.

We need to a way to look up Executions by cache keys.

from ml-metadata.

hughmiao avatar hughmiao commented on May 6, 2024

hi @bhlarson , the current MLMD API provides high level template queries such as listing, filtering by type, id, etc. It also provides low-level graph traversal primitives, such as get artifacts from a context, get executions by events. These APIs give you full access to the underlying data model.

For example, Show a DAG of all related executions and their input and output artifacts of a context, to do this, given the context id, you can start traversal to get all executions and artifacts, with get_executions_by_context, get_artifacts_by_context to get related nodes in the DAG, and get_events_by_execution_ids, get_events_by_artifact_ids to look for the related edges of the DAG. Also see some examples and util functions of using the MLMD API to power the notebooks used in the tfx tutorial, github repo.

There are plans and discussions to declarative query language layers for MLMD. it will definitely make the interactions simpler instead of using low-level primitives. It is not available yet. We also welcome use cases, thoughts and contributions! :)

from ml-metadata.

ntakouris avatar ntakouris commented on May 6, 2024

Agreed with OP. This is completely undocumented. What @hughmiao is out of scope, there is simply no documentation or examples on how to view artifacts, for example. I don't want to browse my cloud storage bucket every time I want to manually inspect an output artifact, for example.

Powering TFX is one good thing, being able to actually use it to view artifacts and store one's custom ones is another.

from ml-metadata.

benmathes avatar benmathes commented on May 6, 2024

As the Product Manager for MLMD, I agree with OP, in a sense. Right now the MLMD querying language's primary ~user is during orchestration: If we don't write it down during training, there won't be anything there to query.

I want to get MLMD to the point where there's a lightweight, composable query language.

(caveat: this is not a roadmap or RFC or spec. An example of where we are headed)

{
  query TFXModel(id: "aslkaj34LJ3") {
    name
    # If sub-querying for a type, a default graph QL does one-hop queries.
    # We would *extend* this to -*> queries along the training DAG.
    Training: {
      duration
      # a recursive graph QL pattern means the 
      # -*> graph query could occur inside the subquery or...
      Dataset: {
        name
        size
      }
    }
    # … the -*> query could occur outside the Training subquery
    Dataset: {
      name
      Size
    }
  }
}

Another potential way to think about "lightweight and composable" with a different ux:

pipeline = mlmd.get(type=context, id="my_pipeline_id")
model = mlmd.get(type=Model, in=pipeline)
sibling_models = model.getAncestor(type=Dataset).getDescendants(type=Model)

The big puzzles we are iterating through that make this a lot more work than my former-startup-engineer head would quickly assume include (but aren't limited to):

  • all the different training DAG structures we have to support
  • getting all the ML infras (several internal custom such infrastructures) built on us to agree enough on overlap on what "model", "dataset", etc. mean
  • supporting multiple backends.
  • some large internal scale performance requirements

from ml-metadata.

smthpickboy avatar smthpickboy commented on May 6, 2024

It might be very useful to be able to query executions or artifacts based on values of some property.

We need to a way to look up Executions by cache keys.

Any update on this? Or does KFP implement query artifacts/executions by properties or custom properties on top of mlmd?

Thanks.

from ml-metadata.

smthpickboy avatar smthpickboy commented on May 6, 2024

As the Product Manager for MLMD, I agree with OP, in a sense. Right now the MLMD querying language's primary ~user is during orchestration: If we don't write it down during training, there won't be anything there to query.

I want to get MLMD to the point where there's a lightweight, composable query language.

(caveat: this is not a roadmap or RFC or spec. An example of where we are headed)

{
  query TFXModel(id: "aslkaj34LJ3") {
    name
    # If sub-querying for a type, a default graph QL does one-hop queries.
    # We would *extend* this to -*> queries along the training DAG.
    Training: {
      duration
      # a recursive graph QL pattern means the 
      # -*> graph query could occur inside the subquery or...
      Dataset: {
        name
        size
      }
    }
    # … the -*> query could occur outside the Training subquery
    Dataset: {
      name
      Size
    }
  }
}

Another potential way to think about "lightweight and composable" with a different ux:

pipeline = mlmd.get(type=context, id="my_pipeline_id")
model = mlmd.get(type=Model, in=pipeline)
sibling_models = model.getAncestor(type=Dataset).getDescendants(type=Model)

The big puzzles we are iterating through that make this a lot more work than my former-startup-engineer head would quickly assume include (but aren't limited to):

  • all the different training DAG structures we have to support
  • getting all the ML infras (several internal custom such infrastructures) built on us to agree enough on overlap on what "model", "dataset", etc. mean
  • supporting multiple backends.
  • some large internal scale performance requirements

Any progress or plans on this? Thanks.

from ml-metadata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.