Comments (4)
HI @jinnovation, sorry to hear this. At this point, we really want to make sure that MLMD uses a backend which has transaction support, so we can ensure data consistency among failures.
So we see a couple of options:
- is there possibility you can find some storage systems supporting transactions? If the system uses SQL the work to support that would be much smaller, otherwise we can do more investigations to understand how much work is necessary;
- MLMD already supports SQLite (on single machine). Depending on operation procedures, I wonder whether we can find a workable approach to use SQLite to back MLMD;
- we have an engineer who is MLMD into Kubeflow and make this possible on Kubernetes. If you can use Kubernetes this may be another option.
- If you can use Google's (or any other public cloud's managed SQL), we can look to expand the connection config to support connecting to a hosted MySQL/Spanner/Postgres.
We are happy to further discuss this, and welcome contributions if we can agree on a path listed above which can unblock your team.
from ml-metadata.
Thanks for the suggestions @zhitaoli. I realized that my previous comment was
slightly misleading, so I wanted to clarify my and my team's motivations.
Specifically, we'd like to use Manhattan, our internal NoSQL storage system,
as a backing layer for ML Metadata. As such, what we're looking for is not so
much support for any specific 3rd-party NoSQL system, but rather for generic,
custom backends. To add to that, we currently have a metadata-store component that's backed by Manhattan; our higher goal is to unify metadata storage.
Hope that helps.
from ml-metadata.
@jinnovation, thanks for the pointer.
Given that Manhattan seems like a closed-source project, I imagine this cannot be done from our side but has to remain a closed source extension to MLMD.
The idea of allow custom backend handler has surfaced with my sync with @hughmiao /etc, and he can assess how much work it is to enable a plugin-ish design for injecting a different storage backend support. Otherwise, you would have to fork MLMD and maintain the storage intergration, which is certainly unpleasant and bears the risk of further drift.
One thing I would suggest checking out is whether the storage system you choose has transaction support: a lot of functionalities of MLMD as well as workflows TFX::OSS built on top relies on atomically creating/updating multiple entities in MLMD in one transaction. If that is not possible, system could be left in corrupt state and very difficult to self-heal or recover. You don't need to disclose this information to us but you definitely should understand the risk with this.
from ml-metadata.
thanks for your interests, @jinnovation. It's great to know that the effort of unifying the metadata model and storage. We are towards the same goal here and excited to get to know your work.
To extend MLMD, here're some general comments about the overall framework and extensible layers which may be useful for the community. I also left some thoughts on your specific case at the end.
At the user facing layer, MLMD provides a unified data model and a set of APIs, which are defined here, implemented in C++ (server and library) and swigged for different client languages (python, go). As long as the API and data model is unified, orchestration, analytics tooling can be shared.
The implementation details for the set of APIs and data model is via two additional layers:
- a domain object access layer that captures the data model details and
- a backend persistent layer that is used for issuing arbitrary queries.
Each layer is extensible. For example, for the backend persistent layer. supporting a new relational and transactional backend only requires extends the metadata_source, and fix some access layer query dialects. An extension like that are several hundreds lines of C++. e.g., sqlite, mysql.
If an new backend is not relational, and no declarative language layer (e.g., no SQL support for the nosql backend here), then extending the domain object access layer is needed, i.e., implementing those CRUD calls for domain data models such as FindTypeById
, CreateArtifact
...
If the backend primitives and data organizations do not fit well with the list of calls in the domain object access layer, then reimplementing the APIs by extending the high level store interface is needed. By doing so, at least you can reuse the tests, the swigging for client libraries, grpc server and release scripts.
Back to the specific NoSQL backend without transaction support, as illustrated above, it is possible to extend the domain object access layer, or extend the store directly and drop the atomicity guarantee of MLMD APIs. One concern is that it may hurt the utility due to the dirty/partial metadata ingested in the store. It now becomes a metadata ingestion clients' problem to ensure the data consistency and clean up when needed. As @zhitaoli mentioned, one usage of MLMD is served as a backend for distributed components for TFX pipelines. The ingestion happens during pipeline runs, and the ingested artifacts and executions states need to consistent for the correctness of the orchestrator. If you also intend to use MLMD together with TFX or other workflow orchestrators, lack of transaction capability of the backend may need to be considered beforehand.
from ml-metadata.
Related Issues (20)
- libmysqlclient error when building from source HOT 1
- Will you consider add user info column in ml-metadata tables ? HOT 5
- Add support for M1 macs HOT 78
- Support for Oracle DB or Microsoft SQL Server HOT 4
- Inconsistent documentation for DB schema versions HOT 1
- Docker Bazel Build Fails HOT 2
- mysql setup for ml-metadata HOT 7
- Cannot filter by the ID of a parent context HOT 2
- Extremely slow performance using remote mlmd instance HOT 2
- When supports for attrs version >21 is available? HOT 7
- conda distribution HOT 4
- Cannot install ml-metadata v1.0.0 HOT 15
- Garbage collection of underlying artifacts HOT 11
- How do I get the stack trace? HOT 3
- Data too long for column 'string_value' at row 1 HOT 3
- Example for connecting to Google Cloud Vertex Metadata HOT 1
- Suitable for computer vision project? HOT 2
- Google.com HOT 1
- Python2_brutegram HOT 1
- fully support mysql 8.0 (`caching_sha2_password` authentication) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml-metadata.