Git Product home page Git Product logo

operatorai / modelstore Goto Github PK

View Code? Open in Web Editor NEW
371.0 371.0 25.0 769 KB

๐Ÿฌ modelstore is a Python library that allows you to version, export, and save a machine learning model to your filesystem or a cloud storage provider.

License: Apache License 2.0

Makefile 0.29% Python 98.64% Shell 0.87% Dockerfile 0.21%
data-science keras machine-learning mlops modelstore python-library pytorch s3-storage scikit-learn tensorflow transformer

modelstore's People

Contributors

cdknorow avatar cpranav93 avatar dependabot[bot] avatar imfaruqi avatar ionicsolutions avatar nlathia avatar rladbrua0207 avatar robertpknight avatar sspillard avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modelstore's Issues

Feature Request: Custom Model Managers

It'd be cool to be able to subclass ModelManager so that I can write custom ones (I'm working on a project with a custom model class). While it's easy to subclass right now, there's no easy way to add a custom ModelManager to ModelStore. So, some method like model_store.register_model_manager(my_custom_manager) would be nice.

Hmm actually after writing the above, I realize I could just append my custom class to model_store._managers which is a bit hacky but feasible. Anywho, I'll leave this issue up as food for thought!

Anonymous access of GCP bucket fails with `ValueError: Anonymous credentials cannot be refreshed.`

Affects modelstore 0.0.74.

To reproduce:

# create a new environment (Python 3.8)
python -m venv env
source env/bin/activate

# install modelstore and GCP CLI
pip install modelstore google-cloud-storage


python
Python 3.8.8 (default, Apr  4 2021, 16:02:17) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelstore import ModelStore
>>> model_store = ModelStore.from_gcloud(bucket_name="xai-demo-models")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/model_store.py", line 90, in from_gcloud
    return ModelStore(
  File "<string>", line 4, in __init__
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/model_store.py", line 105, in __post_init__
    if not self.storage.validate():
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/storage/gcloud.py", line 128, in validate
    if not self.bucket.exists():
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/bucket.py", line 843, in exists
    client._get_resource(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/client.py", line 366, in _get_resource
    return self._connection.api_request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/_http.py", line 73, in api_request
    return call()
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
    return retry_target(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
    return target()
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 482, in api_request
    response = self._make_request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 341, in _make_request
    return self._do_request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 379, in _do_request
    return self.http.request(
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/auth/transport/requests.py", line 526, in request
    self.credentials.refresh(auth_request)
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/auth/credentials.py", line 173, in refresh
    raise ValueError("Anonymous credentials cannot be refreshed.")
ValueError: Anonymous credentials cannot be refreshed.

I remember encountering and resolving this issue while working on #142. We should have a look at the changes introduced by #161.

Output of pip freeze:

cachetools==5.0.0
certifi==2021.10.8
charset-normalizer==2.0.12
click==8.1.3
gitdb==4.0.9
GitPython==3.1.27
google-api-core==2.7.3
google-auth==2.6.6
google-cloud-core==2.3.0
google-cloud-storage==2.3.0
google-crc32c==1.3.0
google-resumable-media==2.3.2
googleapis-common-protos==1.56.0
idna==3.3
joblib==1.1.0
modelstore==0.0.74
numpy==1.22.3
protobuf==3.20.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.27.1
rsa==4.8
six==1.16.0
smmap==5.0.0
tqdm==4.64.0
urllib3==1.26.9

Warning log when uploading a file

When loading a raw file model, there's a warning log that says:

`NoneType` object value of non-optional type type detected when decoding ModelType.

Which needs investigation/resolution.

Transformers - Failure scenarios

https://huggingface.co/gpt2/tree/main

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

and in TensorFlow:(updated with config)

from transformers import GPT2Tokenizer, TFGPT2Model,PretrainedConfig
tokenizertf = GPT2Tokenizer.from_pretrained('gpt2')
model_tf = TFGPT2Model.from_pretrained('gpt2')
configtf=PretrainedConfig.from_pretrained('gpt2')
text = "Replace me by any text you'd like."
encoded_input = tokenizertf(text, return_tensors='tf')
outputtf = model_tf(encoded_input)

image

    return kwargs["model"].optimizer.get_config() - Seems the optimizer is None

Both the model upload is failing to the modelstore

xgb.core.Booster object not supported

When trying to load a xgb.core.Booster object I get a ValueError: could not find matching manager error. So it appears this lower level xgboost object (which doesn't implement the sklearn API) is not currently supported.

`numpy` is not specified as a dependency

Encountered with modelstore version 0.0.74

To reproduce:

# create a new environment (Python 3.8)
python -m venv env
source env/bin/activate

# install modelstore and GCP CLI
pip install modelstore google-cloud-storage


python
Python 3.8.8 (default, Apr  4 2021, 16:02:17) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelstore import ModelStore
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/__init__.py", line 3, in <module>
    from modelstore.model_store import ModelStore
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/model_store.py", line 23, in <module>
    from modelstore.models.managers import iter_libraries, matching_managers, get_manager
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/models/managers.py", line 17, in <module>
    from modelstore.models.annoy import AnnoyManager
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/models/annoy.py", line 18, in <module>
    from modelstore.models.model_manager import ModelManager
  File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/models/model_manager.py", line 22, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

Output of pip freeze:

cachetools==5.0.0
certifi==2021.10.8
charset-normalizer==2.0.12
click==8.1.3
gitdb==4.0.9
GitPython==3.1.27
google-api-core==2.7.3
google-auth==2.6.6
google-cloud-core==2.3.0
google-cloud-storage==2.3.0
google-crc32c==1.3.0
google-resumable-media==2.3.2
googleapis-common-protos==1.56.0
idna==3.3
joblib==1.1.0
modelstore==0.0.74
protobuf==3.20.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.27.1
rsa==4.8
six==1.16.0
smmap==5.0.0
tqdm==4.64.0
urllib3==1.26.9

Missing "Transformers" example

The README lists "Transformers" as having an example file at path examples/examples-by-ml-library/libraries/transformers_example.py, but this path does not exist on the main branch. Thank you!

Deleting artifacts.tar.gz after it is uploaded

If the user creates a file called artifacts.tar.gz, then modelstore creates another file with the same name and over writes it. Also, modelstore will go on to delete that file after uploading it - which is confusing as the author expects the original file to remain

Feature request: override model_id

Hi there,

Currently, a model is effectively versioned when it is uploaded with a UUID. I'd like it if we could pass model_id to the upload method so that we can set it to human-readable string without the need for another layer of indirection.

Thanks!

Make filesystem modelstore independent of creator's root path

Problem

I was trying to run a service in a docker container with a modelstore I created on my own machine as a mounted volume attached to it. This is not possible right now because the models have a fixed absolute path to the users system that created the models.

It should not be hard to circumvent this by only adding the path starting from
operatorai-model-store/domain/date/artifacts.tar.gz to the files within the operator ai directory. The path leading up to it can be added after the modelstore instance is initiated, since the path to the root directory has to be given anyway.

Example

For example, I initiate a modelstore like this:

self.__storage = ModelStore.from_file_system(
  root_directory="/my/local/storage",
  create_directory=False
)

The directory mounts just fine and an existing operator model store exists at the path. However all the model artifact files are irretrievable because the absolute path designated in the generated files links to the absolute path of another filesystem and not the one I chose to use when I initialized the modelstore.

9b03e4f4-72d4-4c9b-894d-41a7c646295a.json

{"storage": {
  "type": "file_system", 
  "path": "/home/some1/Docs/stuff/ml-service/localstorage/operatorai-model-store/lr_prediction/2022.06.21-10.51.20/artifacts.tar.gz"}
}

The problem here is the leading path: /home/some1/Docs/stuff/ml-service/localstorage that should not be there if I initialize the modelstore on a different machine, for instance if I was to share a modelstore on github with multiple participants running on different systems.

Resolution

  1. Take the path from the initialization and store it (already being kept in local.py):
    /my/local/storage

  2. Store relative paths for artifacts generated:

9b03e4f4-72d4-4c9b-894d-41a7c646295a.json

{"storage": {
  "type": "file_system", 
  "path": "operatorai-model-store/lr_prediction/2022.06.21-10.51.20/artifacts.tar.gz"}
}

Note the removal of the leading user filesystem dependent path

  1. Append the initialization path to the artifacts path when needed
source = f"{self.root_dir}/{storage.path}"
print(source)

>>> /my/local/storage/operatorai-model-store/lr_prediction/2022.06.21-10.51.20/artifacts.tar.gz

It may be as simple as changing to this in local.py but I am not sure

    def _storage_location(self, prefix: str) -> metadata.Storage:
        """Returns a dict of the location the artifact was stored"""
        return metadata.Storage.from_path(
            storage_type="file_system",
            path=self.relative_dir(prefix)
        )

    def _get_storage_location(self, meta_data: metadata.Storage) -> str:
        """Extracts the storage location from a meta data dictionary"""
        return f"{self.root_prefix}/{meta_data.path}"

Model Store Path uses a : , which is a reserved character in Windows

The model store uses a: in the pathname as a convention. This causes an issue when copying the model store over to a windows machine.

ie.
operatorai-model-store/c4d99af6-363a-4d57-8934-6f0af2e4c211/2022/02/22/21:20:22

fatal: cannot create directory at 'operatorai-model-store/c4d99af6-363a-4d57-8934-6f0af2e4c211/2022/02/22/21:20:22': Invalid argument

Fixing this would most likely be a breaking change, but it is probably worth doing that to maintain compatibility

Storing model parameters, lineage data and maybe even pre- or post-processing methods?

Adding this issue here for visibility (I received it via email ๐Ÿ“ฅ ):


Currently, our models are deployed by being baked into a docker image, partly because of legacy, but also because our experiment tracking system is not stable enough to be always up, so it is nice to just dump the model into the docker image and then not have the need to connect to multiple services a serving time. So basically I just need to be able to dump my model onto the filesystem.

But, what constitutes a model? If I use this package I would only be able to dump the model file, but information about how the input and output should be interpreted/transformed for this specific model will not be there. This information is often a subset of the training parameters.

Also information for traceability would be nice: Eg. (1.) training run ID (MLFlow run id in my case) + (2.) Epoch and/or step information.

Lastly, in my abstract understanding of what a model really is, I would say that it would even be nice to be able to package pre- and post-processing functions (a model class actually) with the model as well, but this is a different discussion :)

Using MinIO s3 buckets

It is possible to make this work with MinIO S3 buckets? I've been trying it up with no success, or do I have to write my own wrapper around AWSStorage ?

ModuleNotFoundError when using modelstore in Colab

Colab is using fastai==1.0.61; when trying to upload an sklearn model, it throws up with:

/usr/local/lib/python3.7/dist-packages/modelstore/models/fastai.py in matches_with(self, **kwargs)
     60     def matches_with(self, **kwargs) -> bool:
     61         # pylint: disable=import-outside-toplevel
---> 62         from fastai.learner import Learner
     63 
     64         return isinstance(kwargs.get("learner"), Learner)

ModuleNotFoundError: No module named 'fastai.learner'

CLI upload command

I recognize modelstore as a way to outsource the overhead of managing a server entirely to S3. However, if I wanted to model in a non-python language, I'd have to write and execute a python wrapper to perform the raw file upload. Perfectly doable, but I think with that in mind, it would be useful to add an upload command to the CLI that takes in JSON metadata by env var. I plan on trying to wrap my head around the codebase to see if I can come up with something, but I'd love to chat with someone about how best to achieve that functionality to better serve non-python languages.

Cannot download latest model version

Very small bug: if you call model_store.download without a model_id (in order to get the latest model version), an error is raised since the logging code is erroneously trying to format a string as a number.

I was going to fix this myself. As part of the fix, I thought I'd add a test to capture this bug. But then I saw that there's no test for this function and testing seemed nontrivial, so now I'm lazily creating this issue ๐Ÿ™ƒ

Stack trace:

$ model_store.download("some_dir", "my_domain", model_id=None)
--- Logging error ---
Traceback (most recent call last):
  File "/Users/erosenthal/.pyenv/versions/3.7.8/lib/python3.7/logging/__init__.py", line 1025, in emit
    msg = self.format(record)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/lib/python3.7/logging/__init__.py", line 869, in format
    return fmt.format(record)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/lib/python3.7/logging/__init__.py", line 608, in format
    record.message = record.getMessage()
  File "/Users/erosenthal/.pyenv/versions/3.7.8/lib/python3.7/logging/__init__.py", line 369, in getMessage
    msg = msg % self.args
TypeError: must be real number, not str
Call stack:
  File "/Users/erosenthal/.pyenv/versions/modelstore_tests/bin/ipython", line 8, in <module>
    sys.exit(start_ipython())
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/__init__.py", line 126, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/traitlets/config/application.py", line 846, in launch_instance
    app.start()
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/terminal/ipapp.py", line 356, in start
    self.shell.mainloop()
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py", line 563, in mainloop
    self.interact()
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/terminal/interactiveshell.py", line 554, in interact
    self.run_cell(code, store_history=True)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2902, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2947, in _run_cell
    return runner(coro)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner
    coro.send(None)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3173, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3364, in run_ast_nodes
    if (await self.run_code(code, result,  async_=asy)):
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-c8bdaf682d10>", line 1, in <module>
    model_store.download("some_dir", domain, model_id=None)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/modelstore/model_store.py", line 145, in download
    archive_path = self.storage.download(local_path, domain, model_id)
  File "/Users/erosenthal/.pyenv/versions/3.7.8/envs/modelstore_tests/lib/python3.7/site-packages/modelstore/storage/blob_storage.py", line 121, in download
    logger.info("Latest model is: %f", model_meta["model"]["model_id"])
Message: 'Latest model is: %f'

Regression: Model store created with Keras models breaks with the latest merge

@nlathia it looks like the change made to merge the Keras and TensorFlow managers may have broken older model stores. This is the error I see when loading a model that was previously saved.


def get_manager(name: str, storage: CloudStorage = None) -> ModelManager:
       manager = _LIBRARIES[name](storage)
E       KeyError: 'keras'

This actually brings up a good regression test.

Create a model store that has one of each type of model, then the test can try and load all the models to validate they are still working after any changes.

Archives being overwritten if generated too quickly

I originally put this issue into the discussion area, but thought it might be a better fit as an issue.

I am migrating an existing database that is storing models to the modelstore and saw that if I added the models from my database to the modelstore too quickly (within the same second), the versions directory gets updated correctly BUT the timestamped archives get overridden.

I can resolve this for my use case by putting in a 1 second sleep in my import code to make sure there are no collisions, but the modelstore should detect that models might be overridden and fix it by checking for an existing timestamped versioned entry with the same timestamp and modifying the saved timestamp so that nothing gets overridden.

Since the timestamps are not in UTC time, I imagine that a directory containing the model data being accessed from another server in a different time zone may have the same issue with timestamp collisions.

I guess to resolve this the modelstore would need to :

  • Timestamp in UTC time so that all timestamps would only increase over time (so no possibility of TS collision)
  • Detect an existing archive with the same (within the same second) timestamp and pause or modify the generated timestamp to avoid a collision

Not sure if the new release will resolve this, please let me know if I can help or if there is some configuration setting that would resolve this.

Feature: Allow adding additional information to metadata on model upload

There is to my knowledge no straight forward way of retrieving additional data sent on model upload other than downloading the entire artifact and knowing the exact name of the file that it was stored in. It would be nice to be able to add additional information to the model metadata when uploading a new model in order to have direct access to any important information needed for further processing of models.

This could be an optional parameter to the upload method which provides an easy way to add something to the metadata. This could accept a python dictionary and would then be placed in the metadata under a specific key such as "extra".

Use case

# Custom information that a user wants to have available as metadata when calling `get_model_info`
important_info = {
    'required_columns': ["yay", "nay"],
    'data_transforms': ["std", "mean"],
    'training_data_marker': {
        'index_column': 'some_id',
        'index_value': 'some_value',
    },
    'replication_storage_information': {
        "actual_creation_date": "2021-11-23T10:10:23",
        "archived_date": "2022-1-14T12:14:23",
    }
}

metadata = model_store.upload(
       domain="my-domain", 
       state_name="archived", 
       model=lr_model, 
       extra_metadata=important_info
)

print(metadata)
>> 
{
    'model': {
        'domain': {...}, 
        'data': {...}, 
        'storage': {...},
        'code': {...}, 
        'git': {...}, 
        'extra': {
            'required_columns': ["yay", "nay"],
            'data_transforms': ["std", "mean"],
            'training_data_marker': {
                'index_column': 'some_id',
                'index_value': 'some_value',
            },
            'replication_storage_information': {
                "actual_creation_date": "2021-11-23T10:10:23",
                "archived_date": "2022-1-14T12:14:23",
            }
        }
    }
}

The extra parameter would have to be validated which could be done by checking whether the object is json serializable in the update method

if extra_metadata:
    try:
        json.dumps(extra_metadata)
    except Exception:
       raise ValueError("extra_metadata field must be json serializable")

The value of the field could be defaulted to an empty dict i.e 'extra': {} and should not break any existing functionality.

Any opinions on this?

ValueError: could not find matching manager

I am trying to run the following code using v0.0.80

import tensorflow as tf
from modelstore import ModelStore

model_store = ModelStore.from_azure(
        container_name="xyz",
        root_prefix="xyz",
    )


def tf_model():
    model = tf.keras.models.Sequential(
        [
            tf.keras.layers.Dense(5, activation="relu", input_shape=(10,)),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(1),
        ]
    )
    model.compile(optimizer="adam", loss="mean_squared_error")
    return model
# Upload model 
model = tf_model()
meta_data = model_store.upload(model_domain, model=model)

I get the following error:

Traceback (most recent call last):
  File "/tmp/core/trainer.py", line 140, in <module>
    meta_data = model_store.upload(model_domain, model=model)
  File "/tmp/.venv/lib/python3.10/site-packages/modelstore/model_store.py", line 283, in upload
    managers = matching_managers(self._libraries, **kwargs)
  File "/tmp/.venv/lib/python3.10/site-packages/modelstore/models/managers.py", line 87, in matching_managers
    raise ValueError("could not find matching manager")
ValueError: could not find matching manager

When I tried to log all the managers in this function:

def matching_managers(managers: list, **kwargs) -> List[ModelManager]:

I get this:

[<modelstore.models.missing_manager.MissingDepManager object at 0x12da03610>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da03d60>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34ee0>, <modelstore.models.model_file.ModelFileManager object at 0x12da34940>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34a30>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da348e0>, <modelstore.models.missing_manager.MissingDepManager object at 0x12d44df00>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34b80>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34ac0>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34970>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34bb0>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34be0>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da347f0>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34e80>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34e20>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34dc0>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34d60>, <modelstore.models.missing_manager.MissingDepManager object at 0x12da34d00>]

colab notebook example error

Hi, great concept! Trying to get this example to work:

https://colab.research.google.com/drive/1yEY6wy68k7TlHzm8iJMKKBG_Pl-MGZUe?usp=sharing

All cells run fine up to this one:

modelstore = ModelStore.from_gcloud(
    project_name=gcp_project_id,
    bucket_name=gcp_bucket_name
)

GCP project name: testmodstore
Cloud Storage bucket name: modstore

---------------------------------------------------------------------------

TransportError                            Traceback (most recent call last)

[/usr/local/lib/python3.7/dist-packages/google/auth/compute_engine/credentials.py](https://localhost:8080/#) in refresh(self, request)
    110         try:
--> 111             self._retrieve_info(request)
    112             self.token, self.expiry = _metadata.get_service_account_token(

17 frames

TransportError: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7feb2aa2f110>)

Guidance appreciated!!

Can I refactor your code into pyscaffold?

I've used Pyscaffold before to maintain structured python libraries. It offers

  1. Machinery over documentation
  2. Best practices around setup.cfg instead of setup.py
  3. Development builds (so I can do something like pip install -e .[dev])
  4. Pre-commits

In addition, I could

  1. Set up workflows to push to .whl packages to GH (and optionally to pypi later)

TypeError when uploading complex sklearn pipelines

For example:

categorical_transformer = OneHotEncoder(handle_unknown="ignore")
preprocessor = ColumnTransformer(
    transformers=[
      ("cat", categorical_transformer, ["a", "b", "c"])
    ],
    remainder="passthrough"
)
clf = xgb.XGBClassifier()

model = Pipeline(
    steps=[("preprocessor", preprocessor),
           ("classifier", clf)]
)

Raises this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-50ab474b647e> in <module>()
----> 1 model_store.upload("testing", model=model)

6 frames
/usr/lib/python3.7/json/encoder.py in default(self, o)
    177 
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')
    181 

TypeError: Object of type OneHotEncoder is not JSON serializable

Unsetting Model State

Is it possible to unset a model state? I'm interested in changing a model from a shadow state to a production state such that the model no longer shows up if I query

model_store.list_versions("my-domain", state_name="shadow")

I see that I can set a model state, and that a model can be set to multiple different states at the same time, but I don't see any support for removing a model from a given state.

Bug report: uploading sklearn-onnx models

๐Ÿ› A bug has been reported to me, where modelstore does not find a matching manager when trying to upload an onnx model. This needs to be investigated to see if we can replicate it, and then fix it.

In this specific instance, the type of the model is:

<class 'onnx.onnx_ml_pb2.ModelProto'>

Which looks like it's different from the type that modelstore checks against.

The dependencies that were being used by the reporter are modelstore>=0.0.73 with:

google-cloud-bigquery==2.28.0
google-cloud-core==2.1.0
imblearn==0.0
jinja2==2.11.3
lightgbm==3.2.1
onnx==1.7.0
onnxconverter-common==1.7.0
onnxmltools==1.7.0
onnxruntime==1.6.0
optuna==2.8.0
pandas==1.1.4
pyarrow==4.0.1
pytest==3.6.3
scikit-learn==0.24.1
skl2onnx==1.7.0
tqdm==4.54.1

And a report that:

skl2onnx is the problem, anything above 1.7.0 will cause that manager not found error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.