Comments (6)
It might be very useful to be able to query executions or artifacts based on values of some property.
We need to a way to look up Executions by cache keys.
from ml-metadata.
hi @bhlarson , the current MLMD API provides high level template queries such as listing, filtering by type, id, etc. It also provides low-level graph traversal primitives, such as get artifacts from a context, get executions by events. These APIs give you full access to the underlying data model.
For example, Show a DAG of all related executions and their input and output artifacts of a context
, to do this, given the context id, you can start traversal to get all executions and artifacts, with get_executions_by_context
, get_artifacts_by_context
to get related nodes in the DAG, and get_events_by_execution_ids
, get_events_by_artifact_ids
to look for the related edges of the DAG. Also see some examples and util functions of using the MLMD API to power the notebooks used in the tfx tutorial, github repo.
There are plans and discussions to declarative query language layers for MLMD. it will definitely make the interactions simpler instead of using low-level primitives. It is not available yet. We also welcome use cases, thoughts and contributions! :)
from ml-metadata.
Agreed with OP. This is completely undocumented. What @hughmiao is out of scope, there is simply no documentation or examples on how to view artifacts, for example. I don't want to browse my cloud storage bucket every time I want to manually inspect an output artifact, for example.
Powering TFX is one good thing, being able to actually use it to view artifacts and store one's custom ones is another.
from ml-metadata.
As the Product Manager for MLMD, I agree with OP, in a sense. Right now the MLMD querying language's primary ~user is during orchestration: If we don't write it down during training, there won't be anything there to query.
I want to get MLMD to the point where there's a lightweight, composable query language.
(caveat: this is not a roadmap or RFC or spec. An example of where we are headed)
{
query TFXModel(id: "aslkaj34LJ3") {
name
# If sub-querying for a type, a default graph QL does one-hop queries.
# We would *extend* this to -*> queries along the training DAG.
Training: {
duration
# a recursive graph QL pattern means the
# -*> graph query could occur inside the subquery or...
Dataset: {
name
size
}
}
# … the -*> query could occur outside the Training subquery
Dataset: {
name
Size
}
}
}
Another potential way to think about "lightweight and composable" with a different ux:
pipeline = mlmd.get(type=context, id="my_pipeline_id")
model = mlmd.get(type=Model, in=pipeline)
sibling_models = model.getAncestor(type=Dataset).getDescendants(type=Model)
The big puzzles we are iterating through that make this a lot more work than my former-startup-engineer head would quickly assume include (but aren't limited to):
- all the different training DAG structures we have to support
- getting all the ML infras (several internal custom such infrastructures) built on us to agree enough on overlap on what "model", "dataset", etc. mean
- supporting multiple backends.
- some large internal scale performance requirements
from ml-metadata.
It might be very useful to be able to query executions or artifacts based on values of some property.
We need to a way to look up Executions by cache keys.
Any update on this? Or does KFP implement query artifacts/executions by properties or custom properties on top of mlmd?
Thanks.
from ml-metadata.
As the Product Manager for MLMD, I agree with OP, in a sense. Right now the MLMD querying language's primary ~user is during orchestration: If we don't write it down during training, there won't be anything there to query.
I want to get MLMD to the point where there's a lightweight, composable query language.
(caveat: this is not a roadmap or RFC or spec. An example of where we are headed)
{ query TFXModel(id: "aslkaj34LJ3") { name # If sub-querying for a type, a default graph QL does one-hop queries. # We would *extend* this to -*> queries along the training DAG. Training: { duration # a recursive graph QL pattern means the # -*> graph query could occur inside the subquery or... Dataset: { name size } } # … the -*> query could occur outside the Training subquery Dataset: { name Size } } }Another potential way to think about "lightweight and composable" with a different ux:
pipeline = mlmd.get(type=context, id="my_pipeline_id") model = mlmd.get(type=Model, in=pipeline) sibling_models = model.getAncestor(type=Dataset).getDescendants(type=Model)The big puzzles we are iterating through that make this a lot more work than my former-startup-engineer head would quickly assume include (but aren't limited to):
- all the different training DAG structures we have to support
- getting all the ML infras (several internal custom such infrastructures) built on us to agree enough on overlap on what "model", "dataset", etc. mean
- supporting multiple backends.
- some large internal scale performance requirements
Any progress or plans on this? Thanks.
from ml-metadata.
Related Issues (20)
- Will you consider add user info column in ml-metadata tables ? HOT 5
- Add support for M1 macs HOT 78
- Support for Oracle DB or Microsoft SQL Server HOT 4
- Inconsistent documentation for DB schema versions HOT 1
- Docker Bazel Build Fails HOT 2
- mysql setup for ml-metadata HOT 7
- Cannot filter by the ID of a parent context HOT 2
- Extremely slow performance using remote mlmd instance HOT 2
- When supports for attrs version >21 is available? HOT 7
- conda distribution HOT 4
- Cannot install ml-metadata v1.0.0 HOT 15
- Garbage collection of underlying artifacts HOT 11
- How do I get the stack trace? HOT 3
- Data too long for column 'string_value' at row 1 HOT 3
- Example for connecting to Google Cloud Vertex Metadata HOT 1
- Suitable for computer vision project? HOT 2
- Google.com HOT 1
- Python2_brutegram HOT 1
- fully support mysql 8.0 (`caching_sha2_password` authentication) HOT 3
- Fb.me Bruterforce attack
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml-metadata.