The list of "Functionality Enabled by MLMD" implies the ability to query MLMD. For ex

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Agreed with OP. This is completely undocumented. What <a class="user-mention notransla

What querying capabilities does ml-metadata provide about ml-metadata HOT 6 OPEN

google commented on May 6, 2024 4

What querying capabilities does ml-metadata provide

from ml-metadata.

Comments (6)

Ark-kun commented on May 6, 2024 1

It might be very useful to be able to query executions or artifacts based on values of some property.

We need to a way to look up Executions by cache keys.

from ml-metadata.

hughmiao commented on May 6, 2024

hi @bhlarson , the current MLMD API provides high level template queries such as listing, filtering by type, id, etc. It also provides low-level graph traversal primitives, such as get artifacts from a context, get executions by events. These APIs give you full access to the underlying data model.

For example, Show a DAG of all related executions and their input and output artifacts of a context, to do this, given the context id, you can start traversal to get all executions and artifacts, with get_executions_by_context, get_artifacts_by_context to get related nodes in the DAG, and get_events_by_execution_ids, get_events_by_artifact_ids to look for the related edges of the DAG. Also see some examples and util functions of using the MLMD API to power the notebooks used in the tfx tutorial, github repo.

There are plans and discussions to declarative query language layers for MLMD. it will definitely make the interactions simpler instead of using low-level primitives. It is not available yet. We also welcome use cases, thoughts and contributions! :)

from ml-metadata.

ntakouris commented on May 6, 2024

Agreed with OP. This is completely undocumented. What @hughmiao is out of scope, there is simply no documentation or examples on how to view artifacts, for example. I don't want to browse my cloud storage bucket every time I want to manually inspect an output artifact, for example.

Powering TFX is one good thing, being able to actually use it to view artifacts and store one's custom ones is another.

from ml-metadata.

benmathes commented on May 6, 2024

As the Product Manager for MLMD, I agree with OP, in a sense. Right now the MLMD querying language's primary ~user is during orchestration: If we don't write it down during training, there won't be anything there to query.

I want to get MLMD to the point where there's a lightweight, composable query language.

(caveat: this is not a roadmap or RFC or spec. An example of where we are headed)

{
  query TFXModel(id: "aslkaj34LJ3") {
    name
    # If sub-querying for a type, a default graph QL does one-hop queries.
    # We would *extend* this to -*> queries along the training DAG.
    Training: {
      duration
      # a recursive graph QL pattern means the 
      # -*> graph query could occur inside the subquery or...
      Dataset: {
        name
        size
      }
    }
    # … the -*> query could occur outside the Training subquery
    Dataset: {
      name
      Size
    }
  }
}

Another potential way to think about "lightweight and composable" with a different ux:

pipeline = mlmd.get(type=context, id="my_pipeline_id")
model = mlmd.get(type=Model, in=pipeline)
sibling_models = model.getAncestor(type=Dataset).getDescendants(type=Model)

The big puzzles we are iterating through that make this a lot more work than my former-startup-engineer head would quickly assume include (but aren't limited to):

all the different training DAG structures we have to support
getting all the ML infras (several internal custom such infrastructures) built on us to agree enough on overlap on what "model", "dataset", etc. mean
supporting multiple backends.
some large internal scale performance requirements

from ml-metadata.

smthpickboy commented on May 6, 2024

It might be very useful to be able to query executions or artifacts based on values of some property.

We need to a way to look up Executions by cache keys.

Any update on this? Or does KFP implement query artifacts/executions by properties or custom properties on top of mlmd?

Thanks.

from ml-metadata.

smthpickboy commented on May 6, 2024

As the Product Manager for MLMD, I agree with OP, in a sense. Right now the MLMD querying language's primary ~user is during orchestration: If we don't write it down during training, there won't be anything there to query.

I want to get MLMD to the point where there's a lightweight, composable query language.

(caveat: this is not a roadmap or RFC or spec. An example of where we are headed)
{
  query TFXModel(id: "aslkaj34LJ3") {
    name
    # If sub-querying for a type, a default graph QL does one-hop queries.
    # We would *extend* this to -*> queries along the training DAG.
    Training: {
      duration
      # a recursive graph QL pattern means the 
      # -*> graph query could occur inside the subquery or...
      Dataset: {
        name
        size
      }
    }
    # … the -*> query could occur outside the Training subquery
    Dataset: {
      name
      Size
    }
  }
}
Another potential way to think about "lightweight and composable" with a different ux:
pipeline = mlmd.get(type=context, id="my_pipeline_id")
model = mlmd.get(type=Model, in=pipeline)
sibling_models = model.getAncestor(type=Dataset).getDescendants(type=Model)
The big puzzles we are iterating through that make this a lot more work than my former-startup-engineer head would quickly assume include (but aren't limited to):

all the different training DAG structures we have to support

getting all the ML infras (several internal custom such infrastructures) built on us to agree enough on overlap on what "model", "dataset", etc. mean

supporting multiple backends.

some large internal scale performance requirements

Any progress or plans on this? Thanks.

from ml-metadata.

What querying capabilities does ml-metadata provide about ml-metadata HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent