google / ml-metadata Goto Github PK

For recording and retrieving metadata associated with ML developer and data scientist workflows.

Home Page: https://www.tensorflow.org/tfx/guide/mlmd

License: Apache License 2.0

Python 7.75% Shell 0.19% C++ 85.51% Go 3.89% Dockerfile 0.17% Starlark 2.48%

ml-metadata's Introduction

ML Metadata

ML Metadata (MLMD) is a library for recording and retrieving metadata associated with ML developer and data scientist workflows.

NOTE: ML Metadata may be backwards incompatible before version 1.0.

Getting Started

For more background on MLMD and instructions on using it, see the getting started guide

Installing from PyPI

The recommended way to install ML Metadata is to use the PyPI package:

pip install ml-metadata

Then import the relevant packages:

from ml_metadata import metadata_store
from ml_metadata.proto import metadata_store_pb2

Nightly Packages

ML Metadata (MLMD) also hosts nightly packages at https://pypi-nightly.tensorflow.org on Google Cloud. To install the latest nightly package, please use the following command:

pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple ml-metadata

Installing with Docker

This is the recommended way to build ML Metadata under Linux, and is continuously tested at Google.

Please first install docker and docker-compose by following the directions: docker; docker-compose.

Then, run the following at the project root:

DOCKER_SERVICE=manylinux-python${PY_VERSION}
sudo docker-compose build ${DOCKER_SERVICE}
sudo docker-compose run ${DOCKER_SERVICE}

where PY_VERSION is one of {39, 310, 311}.

A wheel will be produced under dist/, and installed as follows:

pip install dist/*.whl

Installing from source

1. Prerequisites

To compile and use ML Metadata, you need to set up some prerequisites.

Install Bazel

If Bazel is not installed on your system, install it now by following these directions.

Install cmake

If cmake is not installed on your system, install it now by following these directions.

2. Clone ML Metadata repository

git clone https://github.com/google/ml-metadata
cd ml-metadata

Note that these instructions will install the latest master branch of ML Metadata. If you want to install a specific branch (such as a release branch), pass -b <branchname> to the git clone command.

3. Build the pip package

ML Metadata uses Bazel to build the pip package from source:

python setup.py bdist_wheel

You can find the generated .whl file in the dist subdirectory.

4. Install the pip package

pip install dist/*.whl

5.(Optional) Build the grpc server

ML Metadata uses Bazel to build the c++ binary from source:

bazel build -c opt --define grpc_no_ares=true  //ml_metadata/metadata_store:metadata_store_server

Supported platforms

MLMD is built and tested on the following 64-bit operating systems:

macOS 10.14.6 (Mojave) or later.
Ubuntu 20.04 or later.
[DEPRECATED] Windows 10 or later. For a Windows-compatible library, please refer to MLMD 1.14.0 or earlier versions.

ml-metadata's People

Contributors

Stargazers

Watchers

Forkers

ddutta the-mad-pirate yupbank yazici danchyy neuromage dreamryx tomomasatakatori fstroth htahir1 bhlarson kwlzn zxmys bangrejas takabayashi hephaex rauanarchive silvrwolfboy atn832 curioustauseef numerology jlemire-u3d ark-kun mdlglobal-atlassian-net weddingjuma sdwivedi thaddock adamjm briansong salimmj blockspacer 18jeffreyma ikeeip odj daikeshi chumbaloo stefanofioravanzo maganaluis merlin1649 neotim nachocano fanszoro isabella232 arrikto sile zijianjoy bencwallace kstreee-furiosa tom-samsung chepkoyallan ulanka jay90099 stenicholas batmangshock unclerayray zzing0907 rajnidahiya classicvalues cp-sadag dhruvesh09 lsdxp chunmk paoloambrosio-skyuk annmary-roy ajunlonglive alhajee ai-ahmed alanvn001 tomar27 manny27nyc rikokir johnhoman shivajid candiedcode odidev droctothorpe algonacci wyljpn physolia candiedcodes sue012012 ganimarquez hsuyuming rimolive rochaporto pnagori02 faregas2007 vamsi-cloudangles nicholasjng mariocandela plattenschieber amerberg sergey-serebryakov venkat2469 humairak adilhusain-s redwoodtj justcherie amitmukati-2604 rtg0795

ml-metadata's Issues

Official/ recommended naming convention for pipelines

Hi team, are there recommended naming conventions and namespace approaches for pipelines aside from the default interactive + unix timestamp naming that's used?

Using the same MySQL db to run more than 1 tfx pipelines

Let's take an example scenario,

I want to run 1 model per customer. If I have 10 customers, there would be 10 models (10 tfx pipelines).

If I hook up the tfx pipeline in airflow to run 10 tfx pipelines for 10 customers, the artifacts generated and stored in MySQL db are getting mixed between customers/pipelines.

For example, if I have to run the pipeline for 2 customers and the model Evaluator for the 2nd pipeline considers the 1st pipeline's model as a baseline model and 2nd pipeline's model as candidate model.

INFO:absl:Using ../../data/artifacts_<customer_2>/Trainer/model/2265/serving_model_dir as candidate model.
INFO:absl:Using ../../data/artifacts_<customer_1>/Trainer/model/1446/serving_model_dir as baseline model.
INFO:absl:Evaluating model.
INFO:absl:Using 1 process(es) for Beam pipeline execution.
WARNING:absl:inputs do not match those expected by the model: input_names=['examples'], found in extracts={}
INFO:absl:Evaluation complete. Results written to ../../data/artifacts_<customer_2>/Evaluator/evaluation/2288.
INFO:absl:Checking validation results.
INFO:absl:Blessing result True written to ../../data/artifacts_<customer_2>/Evaluator/blessing/2288.
INFO:absl:Running publisher for Evaluator
INFO:absl:MetadataStore with DB connection initialized

This issue was not there when I was using a file-based SQLite connection. The artifacts were being stored in a separate directory per customer and the pipeline was also identifying them as separate. But SQLite wasn't scalable. Hence, I moved to MYSQL connection.

Is there a way to separately version the Artifacts between multiple pipelines using MySQL connection? If so, please let me know. Or if it's possible to scale file-based SQLite to scale to 1000s of connections, please let me know.

Support for building the server for alternate linux platforms?

Currently the Dockerfile used for building and running the server only supports Ubuntu 18.04. I have a use case where I want to run the server (outside of a container) on a CentOS host. Are there any plans for other supported build targets?

I have a Dockerfile that compiles for centos7 that my team at Twitter is using to build the server. If there's interest, I would be happy to commit it upstream.

Trouble shooting MLMD schema migration

I'm running MLMD in KubeFlow and tried to migrate the schema from version 4 (MLMD v0.21.1) to version 6 (MLMD v0.25.0).

I attempted an upgrade from a Jupyter server running in KF using:
from ml_metadata import metadata_store
from ml_metadata.proto import metadata_store_pb2

connection_config = metadata_store_pb2.ConnectionConfig()
connection_config.mysql.host = 'metadata-db.kubeflow'
connection_config.mysql.port = 3306
connection_config.mysql.database = 'metadb'
connection_config.mysql.user = ''
connection_config.mysql.password = ''
store = metadata_store.MetadataStore(connection_config, enable_upgrade_migration = True)

This successfully migrated the schema version from 4 to 5 (which I could track by the error response of the grpc-deployment)

However, when deploying MLMD 0.25.0 I still get the following error indicating that migration to v6 never took place. There is no indication in the mysql container log is shown that any migration has taken place.

2020-11-26 13:22:43.483865: F ml_metadata/metadata_store/metadata_store_server_main.cc:220] Non-OK-status: status status: Failed precondition: MLMD database version 5 is older than library version 6. Schema migration is disabled. Please upgrade the database then use the library version; or switch to a older library version to use the current database.MetadataStore cannot be created with the given connection config.

How can I troubleshoot this?

Throws error when running example

I just follow the example provided in https://www.tensorflow.org/tfx/guide/mlmd.

it fails at

input_event = metadata_store_pb2.Event()
input_event.artifact_id = data_artifact_id
input_event.execution_id = run_id
input_event.type = metadata_store_pb2.Event.DECLARED_INPUT

and gives error

Traceback (most recent call last):
  File "test-mlmetadata.py", line 45, in <module>
    input_event.artifact_id = data_artifact_id
TypeError: [2] has type list, but expected one of: int, long

it seems that the artifact_id expects a number and instead data_artifact_id is a list with numbers?

Is S3 supported as the artifact storage for MLMD?

Documentation would be appreciated.

Compilation Issue: crosses boundary of subpackage

There seems to be an issue with the TensorFlow dependency which prevents me from compiling ml-metadata. It seems to be similar to kubeflow/pipelines#1288 which hopefully fixed with the next TF release.
Is there a way I can avoid that issue in the meantime?

$ bazel run -c opt --define grpc_no_ares=true ml_metadata:build_pip_package
Starting local Bazel server and connecting to it...
INFO: An error occurred during the fetch of repository 'io_bazel_rules_closure'
INFO: Call stack for the definition of repository 'io_bazel_rules_closure':
 - /Users/jarango/ml-metadata/WORKSPACE:25:1
ERROR: error loading package '': in /Users/jarango/ml-metadata/ml_metadata/workspace.bzl: in /private/var/tmp/_bazel_jarango/3250bda47e0c37e69ea58b486f066546/external/org_tensorflow/tensorflow/workspace.bzl: Label '@org_tensorflow//third_party:nccl/nccl_configure.bzl' crosses boundary of subpackage '@org_tensorflow//third_party/nccl' (perhaps you meant to put the colon here: '@org_tensorflow//third_party/nccl:nccl_configure.bzl'?)
ERROR: error loading package '': in /Users/jarango/ml-metadata/ml_metadata/workspace.bzl: in /private/var/tmp/_bazel_jarango/3250bda47e0c37e69ea58b486f066546/external/org_tensorflow/tensorflow/workspace.bzl: Label '@org_tensorflow//third_party:nccl/nccl_configure.bzl' crosses boundary of subpackage '@org_tensorflow//third_party/nccl' (perhaps you meant to put the colon here: '@org_tensorflow//third_party/nccl:nccl_configure.bzl'?)
INFO: Elapsed time: 19.671s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)

$ bazel version
Build label: 0.25.2
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri May 10 20:50:40 2019 (1557521440)
Build timestamp: 1557521440
Build timestamp as int: 1557521440

$ clang --version
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin```

Context.type missing in metadata_store_pb2.Context Python class

Hello team,
I am working with metadata_store_pb2.Context and am trying to get ContextType name from context.

When I do get_contexts_by_execution after putting context types and contexts, I do not get context.type in the return structure.
context.type_id is populated and correct. Is this expected behavior?

In my code I have to filter for specific ContextType in returned list of Context. The workaround I am using is to create a dictionary of context type_id and context name and then use this map to match with ContextType name

context_types = get_context_types_by_id([context.type_id for context in contexts])
context_types_map = {context_type.id: context_type.name for context_type in context_types}

To put context type I am calling

context_type = metadata_store_pb2.ContextType()
context_type.name = "NewContext"
put_context_type([context_type])

To put in context I am using

put_contexts(list of contexts)

declarative query api to support property queries

Proposal
We would like to use an Artifact Type for Datasets, and another one for Datafiles, then query all DataFiles associated with a particular Dataset. However right now there is no way for me to perform this query.

We need the ability to query artifacts based on the property values, specifically I'm looking for get_artifacts_by_property_value where I can pass they key and value I'm looking for.

The reasoning behind this request is that Datasets can be composed of files, and in restricted projects we might be asked to query and explain the use of each individual file depending on its contents.

class DataFileType(object):
    data_type = metadata_store_pb2.ArtifactType()
    data_type.name = "DataFile_V1.0"
    data_type.properties["data_file_name"] = metadata_store_pb2.STRING
    data_type.properties["data_file_size"] = metadata_store_pb2.STRING
    data_type.properties["data_file_created_by"] = metadata_store_pb2.STRING
    data_type.properties["data_file_created_date"] = metadata_store_pb2.STRING
    data_type.properties["data_file_parent_dataset"] = metadata_store_pb2.STRING

Note: Another approach would be to allow for artifact hierarchies, though the approach above would solve the problem faster.

What is the schema_version in MLMD?

I am confused about the schema version in MLMD. What is the scheme exactly about? Does it refer to changes in relationships between tables in database? Hopefully to get a reply.

Can't import metadata_store_service_pb2_grpc

I try to use the grpc connection, but can't.
Error is "ImportError: cannot import name 'metadata_store_service_pb2_grpc'"

Publish the server binary

So the user can use APT install it.

No way to set max receive message size in MetadataStore

We use ml-metadata python lib to access MLMD. When there're many executions or some large-size properties stored in MLMD, we are getting Received message larger than max error. It would be great if max receive message size can be configurable in MetadataStore.

Relevant stack trace:

Traceback (most recent call last):
 ...
  File "/usr/local/lib/python3.6/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 697, in get_executions_by_type
    self._call('GetExecutionsByType', request, response)
  File "/usr/local/lib/python3.6/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 131, in _call
    return self._call_method(method_name, request, response)
  File "/usr/local/lib/python3.6/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 162, in _call_method
    raise _make_exception(e.details(), e.code().value[0])
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Received message larger than max (4333469 vs. 4194304)

best practices on production deployment?

I wonder what is a recommended way of deploying mlmd in production?

The only piece in documentation about starting gRPC server I found here:

bazel run -c opt --define grpc_no_ares=true  //ml_metadata/metadata_store:metadata_store_server

Did I miss something?

What would be the best approach to deploy mlmd in Kubernetes?
Some example manifests / documentation on this could be very helpful.

ParentContext related functionality does not seem to be implemented

There seems to be a ParentContext type defined in the proto (https://github.com/google/ml-metadata/blob/master/ml_metadata/proto/metadata_store.proto#L325) as well as a couple of RPCs relating to the ParentContext (https://github.com/google/ml-metadata/blob/master/ml_metadata/proto/metadata_store_service.proto#L716)

However, I can't seem to find any implementation of anything related to parent context in the server, database, or the python package.

Is this intentional? Is there a plan to support ParentContext in the future (and if so, where can I find that)?

Thanks!

Can't run any version past 0.15.2 on OS X 10.13

I'm running Python 3.7.6 on OSX High Sierra 10.13.6 and I get the following error whenever I try to install a ml-metadata version greater than 0.15.2 (I've tried 0.21.0, 0.21.1, and 0.23.0):

ImportError: dlopen(/Users/nlarusstone/.virtualenvs/foo/lib/python3.7/site-packages/ml_metadata/metadata_store/_pywrap_tf_metadata_store_serialized.so, 2): Symbol not found: ____chkstk_darwin
  Referenced from: /Users/nlarusstone/.virtualenvs/foo/lib/python3.7/site-packages/ml_metadata/metadata_store/_pywrap_tf_metadata_store_serialized.so (which was built for Mac OS X 10.15)

Are recent versions of this package only compatible with OSX 10.15?

Is there a recommended workaround for older versions of the OS?

Why doesn't pip install build the wheel locally on each machine?

Consider dropping tensorflow as dependency

Tensorflow is a huge package with many dependencies. It only supports modern CPUs and fails to even import on older ones.

In my opinion, Metadata should be generic and not TF-specific.

Trouble connecting to gRPC server with secure channel

Hi, I want to use the built in ssl in the MetadataStoreServerConfig to secure the channel, but after passing in cert and launching the server, my client can no longer connect to the server. It fails with the following error:

~/.pex/install/grpcio-1.32.0-cp37-cp37m-linux_x86_64.whl.b02a34f09fd3663c27b2a62a6c1b4ea3ed2c602b/grpcio-1.32.0-cp37-cp37m-linux_x86_64.whl/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    824         state, call, = self._blocking(request, timeout, metadata, credentials,
    825                                       wait_for_ready, compression)
--> 826         return _end_unary_response_blocking(state, call, False, None)
    827 
    828     def with_call(self,

~/.pex/install/grpcio-1.32.0-cp37-cp37m-linux_x86_64.whl.b02a34f09fd3663c27b2a62a6c1b4ea3ed2c602b/grpcio-1.32.0-cp37-cp37m-linux_x86_64.whl/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    727             return state.response
    728     else:
--> 729         raise _InactiveRpcError(state)
    730 
    731 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1603218554.257270012","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4133,"referenced_errors":[{"created":"@1603218460.778569046","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"

Upon changing client values for certs, I get this error:

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "Empty Update"

On the server side, I get the following error message:
170490 ssl_transport_security.cc:1239] Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.

I have tried many different things to make it work, and can provide answers to all the follow up questions this. I have been struggling with this issue for a while now, not sure what's going wrong here.

Using openssl s_client -connect host:port, I am able to connect to the server, it's the python ml-metadata store client which is having trouble connecting.

[Support] SDK Docs

Been trying to find the SDK documentation for this project, especially how to retrieve metadata that is stored within an MLMD store but drawing a blank. Looks like the Getting Started guide does a great job of setting up the store and getting data into the store but I'm curious now how to utilize the data in further steps!

Retries on GRPC errors?

I have a TFX pipeline on Kubeflow using BigQueryExampleGen that is frequently failing because of a GRPC error:

	details = "mysql_query failed: errno: 2013, error: Lost connection to MySQL server during query"
	debug_error_string = "{"created":"@1596203848.918267726","description":"Error received from peer ipv4:172.28.140.89:8080","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"mysql_query failed: errno: 2013, error: Lost connection to MySQL server during query","grpc_status":13}"
>

Unfortunately it looks like retry logic only checks for a specific type of AbortError when deciding or not to retry. Is it possible to get it extended to other error types to handle possible transient errors like this?

Below is the full stack trace:

Traceback (most recent call last):
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 378, in <module>
    main()
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 371, in main
    execution_info = launcher.launch()
  File "/tfx-src/tfx/orchestration/launcher/base_component_launcher.py", line 197, in launch
    self._exec_properties)
  File "/tfx-src/tfx/orchestration/launcher/base_component_launcher.py", line 166, in _run_driver
    component_info=self._component_info)
  File "/tfx-src/tfx/components/base/base_driver.py", line 239, in pre_execution
    pipeline_info)
  File "/tfx-src/tfx/orchestration/metadata.py", line 1062, in register_pipeline_contexts_if_not_exists
    _CONTEXT_TYPE_KEY_PIPELINE_NAME: pipeline_info.pipeline_name
  File "/tfx-src/tfx/orchestration/metadata.py", line 980, in _register_context_if_not_exist
    properties=properties)
  File "/tfx-src/tfx/orchestration/metadata.py", line 945, in _prepare_context
    (k, property_type_mapping[type(k)]) for k, v in properties.items()))
  File "/tfx-src/tfx/orchestration/metadata.py", line 922, in _register_context_type_if_not_exist
    context_type, can_add_fields=True)
  File "/opt/venv/lib/python3.6/site-packages/ml_metadata/metadata_store/metadata_store.py", line 463, in put_context_type
    self._call('PutContextType', request, response)
  File "/opt/venv/lib/python3.6/site-packages/ml_metadata/metadata_store/metadata_store.py", line 131, in _call
    return self._call_method(method_name, request, response)
  File "/opt/venv/lib/python3.6/site-packages/ml_metadata/metadata_store/metadata_store.py", line 162, in _call_method
    raise _make_exception(e.details(), e.code().value[0])
tensorflow.python.framework.errors_impl.InternalError: mysql_query failed: errno: 2013, error: Lost connection to MySQL server during query

Why not group dataset artifacts to experiments?

In the last step of official example, it only groups model artifacts to an experiment. I saw a few datasets were created as well.
I assume all the artifacts should be group to experiment. That means attribution should have same number as artifact type?

# Create a ContextType, e.g., Experiment with a note property
experiment_type = metadata_store_pb2.ContextType()
experiment_type.name = "Experiment"
experiment_type.properties["note"] = metadata_store_pb2.STRING
experiment_type_id = store.put_context_type(experiment_type)

# Group the model and the trainer run to an experiment.
my_experiment = metadata_store_pb2.Context()
my_experiment.type_id = experiment_type_id
# Give the experiment a name
my_experiment.name = "exp1"
my_experiment.properties["note"].string_value = "My first experiment."
experiment_id = store.put_contexts([my_experiment])[0]

# Should we also link all the dataset artifact to the context? 
attribution = metadata_store_pb2.Attribution()
attribution.artifact_id = model_artifact_id
attribution.context_id = experiment_id

association = metadata_store_pb2.Association()
association.execution_id = run_id
association.context_id = experiment_id

store.put_attributions_and_associations([attribution], [association])

Another problem I noice it there's no separate methods for put_attribution and put_associations. Is there a plan to support them separately? Or you think that's unnecessary?

Currently, we have to use this way

store.put_attributions_and_associations([], [association]) -> put_associations()
store.put_attributions_and_associations([attribution], []) -> put_attribution()

Support general artifact type

Current artifactType can't be nested object and the property types can only be int, double or string. This is inconvenient to use. E.g. one may want to define a property for annotation of type map<string, string>.

Instead of defining a type system in protobuf, is it possible to use other dedicated schema definition mechanisms such as OpenAPI schema or Apache Avro? Protobuf doesn't seem to be the right tool to validate schema during runtime.

Support artifact state with MYSQL DB.

In Release 0.21.0, Artifact in proto add a state field. But when Connect with MYSQL DB, There are no State field in table Atifact.

We can Create a proto object Artifact with state, but when get this Artifact back from DB, the state will disappear, cause there are no Artifact state store in DB.

Document imports in the Getting started guide

It took me a while to find how to import the dependencies required to run the Getting started code snippets. Can we add this snippet to the guide?

from ml_metadata import metadata_store
from ml_metadata.proto import metadata_store_pb2

Exception on put_contexts

Hello. Help me, please.

Environment

OS: Ubuntu 16.04
Python: 3.7.3
Tensorflow: 0.14.0
ml_metadata: 0.14.0

ml-metadata installation

pip install ml_metadata==0.14.0

Code (from https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md, with some edits)

from ml_metadata.metadata_store import metadata_store
from ml_metadata.metadata_store.metadata_store import metadata_store_pb2

connection_config = metadata_store_pb2.ConnectionConfig()
connection_config.sqlite.filename_uri = 'my.db'
connection_config.sqlite.connection_mode = 3 # READWRITE_OPENCREATE
store = metadata_store.MetadataStore(connection_config)

# Create ArtifactTypes, e.g., Data and Model
data_type = metadata_store_pb2.ArtifactType()
data_type.name = "DataSet"
data_type.properties["day"] = metadata_store_pb2.INT
data_type.properties["split"] = metadata_store_pb2.STRING
data_type_id = store.put_artifact_type(data_type)

model_type = metadata_store_pb2.ArtifactType()
model_type.name = "SavedModel"
model_type.properties["version"] = metadata_store_pb2.INT
model_type.properties["name"] = metadata_store_pb2.STRING
model_type_id = store.put_artifact_type(model_type)

# Create ExecutionType, e.g., Trainer
trainer_type = metadata_store_pb2.ExecutionType()
trainer_type.name = "Trainer"
trainer_type.properties["state"] = metadata_store_pb2.STRING
trainer_type_id = store.put_execution_type(trainer_type)

# Declare input artifact of type DataSet
data_artifact = metadata_store_pb2.Artifact()
data_artifact.uri = 'path/to/data'
data_artifact.properties["day"].int_value = 1
data_artifact.properties["split"].string_value = 'train'
data_artifact.type_id = data_type_id
data_artifact_id = store.put_artifacts([data_artifact])

# Register the Execution of a Trainer run
trainer_run = metadata_store_pb2.Execution()
trainer_run.type_id = trainer_type_id
trainer_run.properties["state"].string_value = "RUNNING"
run_id = store.put_executions([trainer_run])

# Declare the input event
input_event = metadata_store_pb2.Event()
input_event.artifact_id = data_artifact_id[0]
input_event.execution_id = run_id[0]
input_event.type = metadata_store_pb2.Event.DECLARED_INPUT

# Submit input event to the Metadata Store
store.put_events([input_event])

# Declare output artifact of type SavedModel
model_artifact = metadata_store_pb2.Artifact()
model_artifact.uri = 'path/to/model/file'
model_artifact.properties["version"].int_value = 1
model_artifact.properties["name"].string_value = 'MNIST-v1'
model_artifact.type_id = model_type_id
model_artifact_id = store.put_artifacts([model_artifact])


# Declare the output event
output_event = metadata_store_pb2.Event()
output_event.artifact_id = model_artifact_id[0]
output_event.execution_id = run_id[0]
output_event.type = metadata_store_pb2.Event.DECLARED_OUTPUT

# Submit output event to the Metadata Store
store.put_events([output_event])

trainer_run.id = run_id[0]
trainer_run.properties["state"].string_value = "COMPLETED"
store.put_executions([trainer_run])

# Similarly, create a ContextType, e.g., Experiment with a `note` property
experiment_type = metadata_store_pb2.ContextType()
experiment_type.name = "Experiment"
experiment_type.properties["note"] = metadata_store_pb2.STRING
experiment_type_id = store.put_context_type(experiment_type)

# Group the model and the trainer run to an experiment.
my_experiment = metadata_store_pb2.Context()
my_experiment.type_id = experiment_type_id
my_experiment.properties["note"].string_value = "My first experiment."
experiment_id = store.put_contexts([my_experiment])

Error:

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_code(self, code_obj, result, async_)
   3325                 else:
-> 3326                     exec(code_obj, self.user_global_ns, self.user_ns)
   3327             finally:

<ipython-input-8-f54113fb23b7> in <module>
     82 my_experiment.properties["note"].string_value = "My first experiment."
---> 83 experiment_id = store.put_contexts([my_experiment])

/usr/local/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in put_contexts(self, contexts)
    291     response = metadata_store_service_pb2.PutContextsResponse()
--> 292     self._swig_call(metadata_store_serialized.PutContexts, request, response)
    293     result = []

/usr/local/lib/python3.7/site-packages/ml_metadata/metadata_store/metadata_store.py in _swig_call(self, method, request, response)
     75     if status_code != 0:
---> 76       raise _make_exception(error_message, status_code)
     77     response.ParseFromString(response_str)

<class 'str'>: (<class 'TypeError'>, TypeError('__str__ returned non-string (type bytes)'))

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py in run_code(self, code_obj, result, async_)
   3341             if result is not None:
   3342                 result.error_in_exec = sys.exc_info()[1]
-> 3343             self.showtraceback(running_compiled_code=True)
   3344         else:
   3345             outflag = False

/usr/local/lib/python3.7/site-packages/IPython/core/interactiveshell.py in showtraceback(self, exc_tuple, filename, tb_offset, exception_only, running_compiled_code)
   2043                                             value, tb, tb_offset=tb_offset)
   2044 
-> 2045                     self._showtraceback(etype, value, stb)
   2046                     if self.call_pdb:
   2047                         # drop into debugger

/usr/local/lib/python3.7/site-packages/ipykernel/zmqshell.py in _showtraceback(self, etype, evalue, stb)
    544             u'traceback' : stb,
    545             u'ename' : unicode_type(etype.__name__),
--> 546             u'evalue' : py3compat.safe_unicode(evalue),
    547         }
    548 

/usr/local/lib/python3.7/site-packages/ipython_genutils/py3compat.py in safe_unicode(e)
     63     """
     64     try:
---> 65         return unicode_type(e)
     66     except UnicodeError:
     67         pass

TypeError: __str__ returned non-string (type bytes)

One observation

the same error occures when I try to fill non-existent (undeclared in enity type) property of some entity (e.g. Artifact). But here experiment_type has property note => it seems correctly.

Thank you for advance.

What is MLMD's ROADMAP

What is the ROADMAP for MLMD?

Would you consider publishing a ROADMAP.md file in the repository?

How to delete an artifact?

I see that you can delete an artifact type by passing in a put_artifact_type request with can_delete_fields set to true.

However, I don't see how I could delete an artifact without going into the database and manually deleting rows.

Am I missing something?

Thanks!

0.14 contains a breaking change that silently migrates the DB schema by default

My team serves a centralized Kubeflow cluster for various customers at our company, who use TFX to run ML workloads (we are using TFX instead of the KFP DSL specifically because metadata tracking is already included).

We started looking into updating our tools to TFX 0.14, and ran our integration tests with the change on our development cluster, which (as far as we're aware) applied a MLMD schema migration, ultimately renaming the column is_artifact_type to type_kind. We later on found out this was breaking the pipelines that of other users who were still running TFX 0.13. Since this was a dev cluster, nothing critical was impacted, but this brought up a serious concern, which is that any end user who decided to try a newer version of TFX could effectively take down dev/prod because there's an automated DB migration.

We're currently have a solution for the immediate issue, which is to create a limited privilege DB user to be used by MLMD (we have a wrapper around TFX for internal use where we can enforce this) so that schema migrations would fail unless triggered explicitly by an admin.

This however is not the only problem- we're left with the issue of getting all of our teams to "jump at the same time" and upgrade wholesale to TFX 0.14 when we're ready. While we don't have any production pipelines running on Kubeflow, this is a consideration for future breaking changes- if there are existing daily pipelines that are not actively maintained, and they use an older version of TFX, how do we manage them coexisting on the same cluster as newer versions if there is an incompatible schema?

metadata_store.py can't connect to gRPC via secure channel.

Problem
Get the following error when trying to connect to metadata gRPC server via SSL. I'm using the python ml_metadata pacakge 0.15.0

 File "/Users/zhenghui/kubeflow/metadata/.env/lib/python3.6/site-packages/ml_metadata/metadata_store/metadata_store.py", line 103, in _get_channel
    certificate_chain)
  File "/Users/zhenghui/kubeflow/metadata/.env/lib/python3.6/site-packages/grpc/__init__.py", line 1593, in ssl_channel_credentials
    certificate_chain))
  File "src/python/grpcio/grpc/_cython/_cygrpc/credentials.pyx.pxi", line 133, in grpc._cython.cygrpc.SSLChannelCredentials.__cinit__
TypeError: expected certificate to be bytes, got <class 'str'>

It seems SSL config has three string fields while the gRPC method expects bytes.

The code which calls the metadata_store.py is

def __init__(self,
               grpc_host: str = "metadata-grpc-service.kubeflow",
               grpc_port: int = 8080,
               root_certificates = None,
               private_key=None,
               certificate_chain= None):
    config = mlpb.MetadataStoreClientConfig()
    config.host = grpc_host
    config.port = grpc_port
    if private_key:
      config.ssl_config.client_key = private_key
    if root_certificates:
      config.ssl_config.custom_ca = root_certificates
    if certificate_chain:
      config.ssl_config.server_cert = certificate_chain

    self.store = metadata_store.MetadataStore(config,
                                              disable_upgrade_migration=False)

MySQL - no index on EventPath table

It seems that when creating the EventPath table in MySQL there is no index created.

This causes performance to degrade as the amount of data grows in the table (e.g. > 100k rows).

The performance issues were resolved by adding an index:
ALTER TABLE EventPath ADD INDEX event_id_index (event_id);

PutContexts failed: Given node already exists

Hi team, I ran a simple pipeline on JupyterLab that consists of InteractiveContext, ExampleGen, StatisticsGen, SchemaGen, ExampleValidator and Trainer but ml-metadata gRPC server keeps outputting this warning:

2020-07-31 05:48:17.165649: W ml_metadata/metadata_store/metadata_store_service_impl.cc:403] PutContexts failed: Given node already exists: type_id: 2
name: "interactive-2020-07-31T05_00_24.026542"
properties {
  key: "pipeline_name"
  value {
    string_value: "interactive-2020-07-31T05_00_24.026542"
  }
}
Internal: mysql_query failed: errno: 1062, error: Duplicate entry '2-interactive-2020-07-31T05_00_24.026542' for key 'type_id'

There are 5 components and this warning appears 4 times in a row and each time the name is identical.

My understanding is pipeline_name is used in both Artifacts and Contexts and it should allow update on a Context with existing name. if name already exists, it would be an update, but I'm not sure why put_contexts is repeatedly called. Has anyone seen this behavior / is this behavior expected or is something wrong?

Thank you.

Support for NoSQL databases?

Currently, ML Metadata requires transactional databases, which seems to preclude using NoSQL databases such as Cassandra or older versions of MongoDB as storage layers for ML Metadata. This precludes my team at Twitter from using preexisting infrastructure as a storage layer for TFX metadata.

Are there any plans to support NoSQL databases in the future? If so, when?

Design requirements and API doc for ml-metadata

Its great to see this project. Would love to see the design doc (and see how we can leverage it within Kubeflow)...

Deleting/Cleanup older TFX runs

I'm using MySQL db for storing the artifacts generated via TFX runs.

Its been a while TFX has been in production. Since many pipelines have run, MLMD database is getting filled up. Due to large tables, the performance of the database has decreased as well.

Is there a way to programmatically and graciously delete older runs to free up storage and improve DB performance?

Feature request: UI and visualization tools

In the 2019 tf summit presentation about TFX usage of MLMD a couple of visualization and UI tools were shown, to deliver better handling of th mlmd database. such tools help to enable the benefits of mlmd such as artifact history tracking. As of today, with no such tools published, it is up to the developer to write a program that will turn the .db file into something maningful. I advise to release those tools (already shown in the presentation) to the public

ml-metadata image deleted from registry, breaks kubeflow deployments

Hi all!
First of all, thanks for your work on ml-metadata! 😄
It's a project we use in Kubeflow for our tracking needs. More specifically, we make use of the following image:
gcr.io/tfx-oss-public/ml_metadata_store_server:v0.21.1

However, this image recently disappeared. Was it deleted from the registry?
Because of this, all Kubeflow installations have a broken metadata component and some won't continue at all, as they detect a broken installation.
Is it possible to restore the image until a more permanent solution can be implemented? (like cloning images to a kubeflow registry). Thanks a lot!

MLMD comparison with MLFlow/Tracking

Hi, please forgive my ignorance,

Is this project somehow comparable to MLFlow/Tracking ?
https://mlflow.org/docs/latest/tracking.html#tensorflow-and-keras-experimental

Support SSL in MySQL backend?

Hi! It seems like currently if I want to use MySQL as the backend there’s no way to enable SSL of db connections? (I just briefly skimmed the source code so I might be wrong).
Can we added optional MySQL SSL configs to connection config and support enabling SSL of MySQL in MLMD? Thanks!

Blogs and Documentation ?

I want to integrate MLMetadata as part of our data pipeline which is based on AWS pagemaker, Spark etc. Please provide me some relevant documentation as I couldn't find anything much online.

Building with bazel 1.1.0 requires several incompatible flags

Hi! When I tried to build ml-metadata with bazel 1.1.0 (bazel run -c opt --define grpc_no_ares=true ml_metadata:build_pip_package), I found that these incompatible flags are required:

--incompatible_disable_deprecated_attr_params=false 
--incompatible_no_support_tools_in_action_inputs=false 
--incompatible_new_actions_api=false 
--incompatible_string_join_requires_strings=false 
--incompatible_no_rule_outputs_param=false
--incompatible_require_ctx_in_configure_features=false

Is there plan to better support bazel 1.1.0? thanks.

pip install does not work with 0.21.1

When I run pip install for the version I need (because of kubeflow-metadata) I get an error:

pip install ml-metadata==0.21.1
ERROR: Could not find a version that satisfies the requirement ml-metadata==0.21.1 (from versions: 0.12.0.dev0, 0.13.0.dev0, 0.13.1.dev0)
ERROR: No matching distribution found for ml-metadata==0.21.1

It's supposed to exist, but I have no luck making it work: https://pypi.org/project/ml-metadata/0.21.1/

Other packages work but it fails on kubeflow-metadata because of the aforementioned error.

Support for PostgreSQL?

Hi there,

We're wondering about the priority of supporting postgresql. I have a couple questions for the maintainers.

If a PR was submitted for the initial effort, could the project afford to maintain feature-parity with PostgreSQL?
Does anybody know of a reason this wouldn't be fairly easy to port; I mean, over and above the dialect tweaks? For instance, are there any mysql-specific features that might impede support?

ExecutionType and ArtifactType themselves need properties and metadata

For ExecutionType, we want to store the ComponentSpec that is used to spawn the Executions.
For ArtifactType we want to store some information like deep JSON schema, visualization parameters serializers etc.

What querying capabilities does ml-metadata provide

The list of "Functionality Enabled by MLMD" implies the ability to query MLMD. For example "List all Artifacts of a specific type", "query context of workflow runs", and "Show a DAG of all related executions and their input and output artifacts of a context" described by Functionality Enabled by MLMD From the API, I am unclear how to accomplish this functionality without interacting directly with the database which does not conform to your design.

Could you provide examples of how to achieve this functionality?

Specifically, I would like to identify artifacts, executions events, or context through a query of their properties and then retrieve related context, executions and contexts related by the directed acyclic graph (DAG) related to the queried results.

Is this possible? How can it be achieved through the API?

metadata_store.py provides the ability to retrieve an entire list (e.g. get_executions) or a single item (e.g. get_contexts_by_id) but I am unclear how to achieve the stated ml-metadata through this interface.

get_contexts(), get_context_types() etc fail with exception when the DB is empty.

get_contexts() throws "Cannot find any record".
get_context_types() throws "" (which is very confusing).

Expected behavior: the "get all" functions should return empty list when the DB is empty and not throw an exception.

NotFoundError: No type found for query when connecting over gRPC

Following suggestion from #76 I tried to explore k8s deployment similar to the one done in Kubeflow project.

Basing on these manifests https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/base/metadata I created a single deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metadata-grpc-deployment
  namespace: mlmd
  labels:
    component: metadata-grpc-server
spec:
  replicas: 1
  selector:
    matchLabels:
      component: metadata-grpc-server
  template:
    metadata:
      labels:
        component: metadata-grpc-server
    spec:
      containers:
      - name: container
        image: gcr.io/tfx-oss-public/ml_metadata_store_server:0.25.0
        command: ["/bin/metadata_store_server"]
        args: ["--grpc_port=8080",
               "--enable_database_upgrade=true"
               ]
        ports:
        - name: grpc-api
          containerPort: 8080
        livenessProbe:
          tcpSocket:
            port: grpc-api
          initialDelaySeconds: 3
          periodSeconds: 5
          timeoutSeconds: 2
        readinessProbe:
          tcpSocket:
            port: grpc-api
          initialDelaySeconds: 3
          periodSeconds: 5
          timeoutSeconds: 2

and service

kind: Service
apiVersion: v1
metadata:
  labels:
    app: metadata
  name: metadata-grpc-service
  namespace: mlmd
spec:
  selector:
    component: metadata-grpc-server
  type: ClusterIP
  ports:
  - port: 8080
    protocol: TCP
    name: grpc-api

In order to communicate with k8s-deployed microservice I forwarded its port with

kubectl port-forward -n mlmd svc/metadata-grpc-service 8080:8080

I also found these helper files in Kubeflow project https://github.com/kubeflow/pipelines/blob/master/backend/metadata_writer/src/metadata_helpers.py that provides useful level of abstraction.

I first however tried code snippets from mlmd documentatio and noticed that both created artifacts gets same type_id: 1. There is also nothing appearing in the pod's logs.

Then inspired by Kubeflow metadata helpers I did

from ml_metadata.proto import metadata_store_pb2
from ml_metadata.metadata_store import metadata_store

def connect_to_mlmd(metadata_service_host, metadata_service_port) -> metadata_store.MetadataStore:

    mlmd_connection_config = metadata_store_pb2.MetadataStoreClientConfig(
        host=metadata_service_host,
        port=metadata_service_port,
    )

    # Checking the connection to the Metadata store.
    for _ in range(100):
        try:
            mlmd_store = metadata_store.MetadataStore(mlmd_connection_config)
            # All get requests fail when the DB is empty, so we have to use a put request.
            # TODO: Replace with _ = mlmd_store.get_context_types() when https://github.com/google/ml-metadata/issues/28 is fixed
            _ = mlmd_store.put_execution_type(
                metadata_store_pb2.ExecutionType(
                    name="DummyExecutionType",
                )
            )
            return mlmd_store
        except Exception as e:
            print('Failed to access the Metadata store. Exception: "{}"'.format(str(e)), file=sys.stderr)
            sys.stderr.flush()
            sleep(1)

    raise RuntimeError('Could not connect to the Metadata store.')


def get_or_create_artifact_type(store, type_name, properties: dict = None) -> metadata_store_pb2.ArtifactType:
    try:
        artifact_type = store.get_artifact_type(type_name=type_name)
        return artifact_type
    except:
        artifact_type = metadata_store_pb2.ArtifactType(
            name=type_name,
            properties=properties,
        )
        artifact_type.id = store.put_artifact_type(artifact_type) # Returns ID
        return artifact_type


mlmd = connect_to_mlmd("localhost", 8080)    

artifact_type = get_or_create_artifact_type(
    mlmd, 
    type_name="MyArtifactType",
)

print(artifact_type)

which outputs

id: 1
name: "MyArtifactType"

with

2020-11-17 12:06:30.157710: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:588] No property is defined for the Type
2020-11-17 12:06:31.432750: W ml_metadata/metadata_store/metadata_store_service_impl.cc:79] GetArtifactType failed: No type found for query: MyArtifactType
2020-11-17 12:06:31.489764: W ml_metadata/metadata_store/rdbms_metadata_access_object.cc:588] No property is defined for the Type

being logged in the Pod.

However, trying

mlmd.get_artifact_type(type_name="MyArtifactType")

leads to

---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
~/.cache/pypoetry/virtualenvs/mlmd-Ux9x4Ar2-py3.8/lib/python3.8/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response)
    174       try:
--> 175         response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec))
    176       except grpc.RpcError as e:

~/.cache/pypoetry/virtualenvs/mlmd-Ux9x4Ar2-py3.8/lib/python3.8/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
    922                                       wait_for_ready, compression)
--> 923         return _end_unary_response_blocking(state, call, False, None)
    924 

~/.cache/pypoetry/virtualenvs/mlmd-Ux9x4Ar2-py3.8/lib/python3.8/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
    825     else:
--> 826         raise _InactiveRpcError(state)
    827 

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.NOT_FOUND
	details = "No type found for query: MyArtifactType"
	debug_error_string = "{"created":"@1605614795.789362714","description":"Error received from peer ipv6:[::1]:8080","file":"src/core/lib/surface/call.cc","file_line":1061,"grpc_message":"No type found for query: MyArtifactType","grpc_status":5}"
>

During handling of the above exception, another exception occurred:

NotFoundError                             Traceback (most recent call last)
<ipython-input-29-f3d6193a4029> in <module>
----> 1 mlmd.get_artifact_type(type_name="MyArtifactType")

~/.cache/pypoetry/virtualenvs/mlmd-Ux9x4Ar2-py3.8/lib/python3.8/site-packages/ml_metadata/metadata_store/metadata_store.py in get_artifact_type(self, type_name)
    629     response = metadata_store_service_pb2.GetArtifactTypeResponse()
    630 
--> 631     self._call('GetArtifactType', request, response)
    632     return response.artifact_type
    633 

~/.cache/pypoetry/virtualenvs/mlmd-Ux9x4Ar2-py3.8/lib/python3.8/site-packages/ml_metadata/metadata_store/metadata_store.py in _call(self, method_name, request, response)
    148     while True:
    149       try:
--> 150         return self._call_method(method_name, request, response)
    151       except errors.AbortedError:
    152         num_retries -= 1

~/.cache/pypoetry/virtualenvs/mlmd-Ux9x4Ar2-py3.8/lib/python3.8/site-packages/ml_metadata/metadata_store/metadata_store.py in _call_method(self, method_name, request, response)
    178         # description.
    179         # https://grpc.github.io/grpc/python/_modules/grpc.html#StatusCode
--> 180         raise _make_exception(e.details(), e.code().value[0])  # pytype: disable=attribute-error
    181 
    182   def _swig_call(self, method, request, response) -> None:

NotFoundError: No type found for query: MyArtifactType

with Pod's logs

2020-11-17 12:06:35.782095: W ml_metadata/metadata_store/metadata_store_service_impl.cc:79] GetArtifactType failed: No type found for query: MyArtifactType

Received message larger than max (4199881 vs. 4194304)

I have a TFX pipeline that runs in Kubeflow on GCP and recently one of my pipelines started failing with the following error in a ResolverNode.latest_model_resolver and ResolverNode.latest_blessed_model_resolver

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 165, in _call_method
    response.CopyFrom(grpc_method(request))
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "Received message larger than max (4199881 vs. 4194304)"
	debug_error_string = "{"created":"@1603760693.874743930","description":"Received message larger than max (4199881 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 360, in <module>
    main()
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 353, in main
    execution_info = launcher.launch()
  File "/tfx-src/tfx/orchestration/launcher/base_component_launcher.py", line 197, in launch
    self._exec_properties)
  File "/tfx-src/tfx/orchestration/launcher/base_component_launcher.py", line 166, in _run_driver
    component_info=self._component_info)
  File "/tfx-src/tfx/components/common_nodes/resolver_node.py", line 73, in pre_execution
    source_channels=input_dict.copy())
  File "/tfx-src/tfx/dsl/experimental/latest_artifacts_resolver.py", line 56, in resolve
    output_key=c.output_key)
  File "/tfx-src/tfx/orchestration/metadata.py", line 323, in get_qualified_artifacts
    executions = self.store.get_executions_by_context(context.id)
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 1080, in get_executions_by_context
    self._call('GetExecutionsByContext', request, response)
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 140, in _call
    return self._call_method(method_name, request, response)
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 170, in _call_method
    raise _make_exception(e.details(), e.code().value[0])  # pytype: disable=attribute-error
ml_metadata.errors.ResourceExhaustedError: Received message larger than max (4199881 vs. 4194304)

Is there a way to fix this on my side?

Issues building locally on a Mac

Followed the guide, installed bazel etc and tried ....

$ bazel run -c opt --define grpc_no_ares=true ml_metadata:build_pip_package
Starting local Bazel server and connecting to it...
INFO: Invocation ID: 86e0a52a-beab-4a71-869b-f28cae05bd0c
ERROR: /private/var/tmp/_bazel_dedutta/20c43d1af01a72b975f0b1d572e55dd8/external/protobuf_archive/BUILD:591:1: Traceback (most recent call last):
File "/private/var/tmp/_bazel_dedutta/20c43d1af01a72b975f0b1d572e55dd8/external/protobuf_archive/BUILD", line 591
internal_gen_well_known_protos_java(srcs = WELL_KNOWN_PROTOS)
File "/private/var/tmp/_bazel_dedutta/20c43d1af01a72b975f0b1d572e55dd8/external/protobuf_archive/protobuf.bzl", line 269, in internal_gen_well_known_protos_java
Label(("%s//protobuf_java" % REPOSITOR...))
File "/private/var/tmp/_bazel_dedutta/20c43d1af01a72b975f0b1d572e55dd8/external/protobuf_archive/protobuf.bzl", line 269, in Label
REPOSITORY_NAME
The value 'REPOSITORY_NAME' has been removed in favor of 'repository_name()', please use the latter (https://docs.bazel.build/versions/master/skylark/lib/native.html#repository_name). You can temporarily allow the old name by using --incompatible_package_name_is_a_function=false
ERROR: /private/var/tmp/_bazel_dedutta/20c43d1af01a72b975f0b1d572e55dd8/external/protobuf_archive/BUILD:373:1: Target '@protobuf_archive//:android' contains an error and its package is in error and referenced by '@protobuf_archive//:protoc'
ERROR: /private/var/tmp/_bazel_dedutta/20c43d1af01a72b975f0b1d572e55dd8/external/protobuf_archive/BUILD:373:1: Target '@protobuf_archive//:msvc' contains an error and its package is in error and referenced by '@protobuf_archive//:protoc'
ERROR: /Users/dedutta/work/ml-metadata/ml_metadata/proto/BUILD:42:1: Target '@protobuf_archive//:protobuf_python_genproto' contains an error and its package is in error and referenced by '//ml_metadata/proto:metadata_store_py_pb2_genproto'
ERROR: /Users/dedutta/work/ml-metadata/ml_metadata/proto/BUILD:42:1: Target '@protobuf_archive//:protoc' contains an error and its package is in error and referenced by '//ml_metadata/proto:metadata_store_py_pb2_genproto'
ERROR: Analysis of target '//ml_metadata:build_pip_package' failed; build aborted: Analysis failed
INFO: Elapsed time: 14.508s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (14 packages loaded, 40 targets co
FAILED: Build did NOT complete successfully (14 packages loaded, 40 targets co
nfigured)
currently loading: @org_tensorflow//tensorflow/core
Fetching @swig; fetching
Fetching @local_config_python; fetching
Fetching @local_config_cc; Restarting.

PutAttributionsAndAssociations() error: Context id not found.

ml-metadata version: 0.22.0

attribution = metadata_store_pb2.Attribution()
attribution.artifact_id = model_artifact_id
attribution.context_id = experiment_id

association = metadata_store_pb2.Association()
association.execution_id = run_id
attribution.context_id = experiment_id

request = metadata_store_service_pb2.PutAttributionsAndAssociationsRequest()
request.attributions.append(attribution)
request.associations.append(association)
stub.PutAttributionsAndAssociations(request)

The experiment_id is 20 here, and I inspect the database, there is the context which id is 20. So why the error occurs and how to solve it?

No Python 3.8 version?

Hi!

I get this error when I'm trying to install the requirements of TFX on local.
ERROR: Could not find a version that satisfies the requirement ml-metadata<0.22,>=0.21.2 (from tfx==0.22.0.dev0->-r requirements.txt (line 3)) (from versions: 0.12.0.dev0, 0.13.0.dev0, 0.13.1.dev0)

As I interpret it, it doesn't find any version of ml-metadata above 0.13.1.dev0, although 0.21.2 on 29 Feb 2020. Again, I'm no expert, but this could be because there doesn't seem to be a version for Python 3.8 released.

Could you release a version for Python 3.8? When I try to install the requirements with Python 3.6 it works.