googlecloudplatform / mlops-with-vertex-ai Goto Github PK

View Code? Open in Web Editor NEW

332.0 29.0 113.0 207 KB

An end-to-end example of MLOps on Google Cloud using TensorFlow, TFX, and Vertex AI

License: Apache License 2.0

Jupyter Notebook 56.38% Dockerfile 0.20% Python 39.07% HCL 4.36%

mlops vertex-ai tfx gcp tensorflow google-cloud-platform

mlops-with-vertex-ai's People

Contributors

Stargazers

Watchers

Forkers

nadcharin nordar newcooldiscoveries balajibr epimenta-loreal complexglass aiiotlabs roccomichelelancellotti luotigerlsx annasochandure-ssk akshaykumarpatil-tudip cloud-software jsakv immch megha-pa esenthil2018 erodneycorus idowuilekura achbogga marouene-oueslati syedsfayaz graciofilipe maneeshdisodia carlosdeabreu13 nikohunt jx-mlopslabs longhao-yuan-biz rakesh-jangid rjowens42 edwarddennispraveen javiergp conchitagoogle clumbus963 kieranboyce kenmognethimotee lvaylet rjackowens karaszewicz mmcsa edpark nhanthien bv-sm-yoon skaru2002 alicabukel skepticatgit grandelli justinjm felattaoui ra312 kardiff18 isabella232 axyojp dmrglu eunkubae benjaminlq quekhyg yilin2718 akhimji absakho tgaillard1 venkatrebba trouvlemaker qinjeanli sthubeml2020 catwang42 malikamalik edennuriel malikamalik1 aatif191 zacharyvunguyen chandrikamani aaronjoseph highwayns mhyeonsoo vsnupoudel yasark bnarath jankorinek aishwarya-battula pengw6 sadpasmgp jiman94 polanco-jaime rockingmanny userxm flourisher7 ryanreilly1 d-peterside-slalom manishatgit williamgalindezarias eliekawerk cmgiler kevingoh maniveer2017 tuhinmallick sanvyakaranam gfickel brunoscaglione pratikdhanave saoussen-ch

mlops-with-vertex-ai's Issues

02-Experimentation: Create classifier fails w/layer requires matching shapes

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 2), (None, 4), (7,), (None, 3), (None, 1), (None, 1), (6,), (None, 3), (None, 3), (None, 1), (None, 10)]

Input Features
dropoff_grid_xf <dtype: 'int64'>: [0, 0, 0]
euclidean_xf <dtype: 'float32'>: [0.669279932975769, -0.8318284749984741, -0.8318284749984741]
loc_cross_xf <dtype: 'int64'>: [0, 0, 0]
payment_type_xf <dtype: 'int64'>: [2, 0, 0]
pickup_grid_xf <dtype: 'int64'>: [0, 0, 0]
trip_day_of_week_xf <dtype: 'int64'>: [0, 6, 2]
trip_day_xf <dtype: 'int64'>: [8, 30, 1]
trip_hour_xf <dtype: 'int64'>: [1, 13, 6]
trip_miles_xf <dtype: 'float32'>: [2.3255326747894287, -0.22459185123443604, -0.4029441475868225]
trip_month_xf <dtype: 'int64'>: [3, 1, 0]
trip_seconds_xf <dtype: 'float32'>: [0.9550504088401794, -0.2630620300769806, -0.24356801807880402]
target: [0, 0, 0]

Access Denied in BQ

Hi,

when I run bigquery statement in the first notebook example (01-dataset-management), I get the following error

Forbidden: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/<project_id>/jobs?prettyPrint=false: Access Denied: Project <project_id>: User does not have bigquery.jobs.create permission in project <project_id>.

bq command from the notebook terminal works

I followed all the required steps from README.me

I got only this errors but I don't think they are correlated

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tfx 1.2.0 requires google-cloud-aiplatform<0.8,>=0.5.0, but you have google-cloud-aiplatform 1.4.2 which is incompatible.
tfx 1.2.0 requires google-cloud-bigquery<2.21,>=1.28.0, but you have google-cloud-bigquery 2.26.0 which is incompatible.
tfx-bsl 1.2.0 requires google-cloud-bigquery<2.21,>=1.28.0, but you have google-cloud-bigquery 2.26.0 which is incompatible.
tensorflow 2.5.1 requires grpcio~=1.34.0, but you have grpcio 1.41.1 which is incompatible.
tensorflow-transform 1.2.0 requires google-cloud-bigquery<2.21,>=1.28.0, but you have google-cloud-bigquery 2.26.0 which is incompatible.
tensorflow-model-analysis 0.33.0 requires google-cloud-bigquery<2.21,>=1.28.0, but you have google-cloud-bigquery 2.26.0 which is incompatible.
tensorflow-data-validation 1.2.0 requires google-cloud-bigquery<2.21,>=1.28.0, but you have google-cloud-bigquery 2.26.0 which is incompatible.

RuntimeError: You must compile your model before training/testing

When trying to run 02-experimentation, I receive the error: You must compile your model before training/testing.

Error: in experimentation.ipynb, getting error from get_training_source_query function as list out of range and gca_resource not available.

Getting an error as list out of range for get_training_source_query function for public dataset (house price dataset).

`Concatenate` layer requires inputs with matching shapes

Problem

03-training-formalization.ipynb

ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 2), (None, 4), (7,), (None, 3), (None, 1), (None, 1), (6,), (None, 3), (None, 3), (None, 1), (None, 10)]

At:

_train_module_file = 'src/model_training/runner.py'

trainer = tfx.components.Trainer(
    module_file=_train_module_file,
    examples=transform.outputs['transformed_examples'],
    schema=schema_importer.outputs['result'],
    base_model=latest_model_resolver.outputs['latest_model'],
    transform_graph=transform.outputs['transform_graph'],
    hyperparameters=hyperparams_gen.outputs['hyperparameters'],
)

context.run(trainer, enable_cache=False)

Output:

running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying trainer.py -> build/lib
copying runner.py -> build/lib
copying defaults.py -> build/lib
copying task.py -> build/lib
copying model.py -> build/lib
copying exporter.py -> build/lib
copying data.py -> build/lib
installing to /tmp/tmpaet8vft8
running install
running install_lib
copying build/lib/task.py -> /tmp/tmpaet8vft8
copying build/lib/model.py -> /tmp/tmpaet8vft8
copying build/lib/data.py -> /tmp/tmpaet8vft8
copying build/lib/runner.py -> /tmp/tmpaet8vft8
copying build/lib/defaults.py -> /tmp/tmpaet8vft8
copying build/lib/trainer.py -> /tmp/tmpaet8vft8
copying build/lib/exporter.py -> /tmp/tmpaet8vft8
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmp/tmpaet8vft8/tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12-py3.7.egg-info
running install_scripts
creating /tmp/tmpaet8vft8/tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12.dist-info/WHEEL
creating '/tmp/tmpv9boslhs/tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12-py3-none-any.whl' and adding '/tmp/tmpaet8vft8' to it
adding 'data.py'
adding 'defaults.py'
adding 'exporter.py'
adding 'model.py'
adding 'runner.py'
adding 'task.py'
adding 'trainer.py'
adding 'tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12.dist-info/METADATA'
adding 'tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12.dist-info/WHEEL'
adding 'tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12.dist-info/top_level.txt'
adding 'tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12.dist-info/RECORD'
removing /tmp/tmpaet8vft8
/opt/conda/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  setuptools.SetuptoolsDeprecationWarning,
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING:absl:Examples artifact does not have payload_format custom property. Falling back to FORMAT_TF_EXAMPLE
WARNING: Ignoring invalid distribution -umpy (/opt/conda/lib/python3.7/site-packages)
Processing /tmp/tmp31cqmx79/tfx_user_code_Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12-py3-none-any.whl
WARNING: Ignoring invalid distribution -umpy (/opt/conda/lib/python3.7/site-packages)
Installing collected packages: tfx-user-code-Trainer
Successfully installed tfx-user-code-Trainer-0.0+5bd9ac13044cc46c88b21ccdcff2b74d85a1c15be76527e33008964fa7da7d12
WARNING: Ignoring invalid distribution -umpy (/opt/conda/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -umpy (/opt/conda/lib/python3.7/site-packages)
WARNING: Ignoring invalid distribution -umpy (/opt/conda/lib/python3.7/site-packages)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_10107/3433963696.py in <module>
     10 )
     11 
---> 12 context.run(trainer, enable_cache=False)

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/notebook_utils.py in run_if_ipython(*args, **kwargs)
     29       # __IPYTHON__ variable is set by IPython, see
     30       # https://ipython.org/ipython-doc/rel-0.10.2/html/interactive/reference.html#embedding-ipython.
---> 31       return fn(*args, **kwargs)
     32     else:
     33       logging.warning(

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in run(self, component, enable_cache, beam_pipeline_args)
    162         telemetry_utils.LABEL_TFX_RUNNER: runner_label,
    163     }):
--> 164       execution_id = launcher.launch().execution_id
    165 
    166     return execution_result.ExecutionResult(

~/.local/lib/python3.7/site-packages/tfx/orchestration/launcher/base_component_launcher.py in launch(self)
    201                          copy.deepcopy(execution_decision.input_dict),
    202                          execution_decision.output_dict,
--> 203                          copy.deepcopy(execution_decision.exec_properties))
    204 
    205     absl.logging.info('Running publisher for %s',

~/.local/lib/python3.7/site-packages/tfx/orchestration/launcher/in_process_component_launcher.py in _run_executor(self, execution_id, input_dict, output_dict, exec_properties)
     72     # output_dict can still be changed, specifically properties.
     73     executor.Do(
---> 74         copy.deepcopy(input_dict), output_dict, copy.deepcopy(exec_properties))

~/.local/lib/python3.7/site-packages/tfx/components/trainer/executor.py in Do(self, input_dict, output_dict, exec_properties)
    176     # Train the model
    177     absl.logging.info('Training model.')
--> 178     run_fn(fn_args)
    179 
    180     # Note: If trained with multi-node distribution workers, it is the user

/tmp/tmp7ilh_lit/runner.py in run_fn(fn_args)
     51         hyperparams=hyperparams,
     52         log_dir=log_dir,
---> 53         base_model_dir=fn_args.base_model,
     54     )
     55 

~/home/repositories/git/oonisim/python-programs/courses/mlops-with-vertex-ai/src/model_training/trainer.py in train(train_data_dir, eval_data_dir, tft_output_dir, hyperparams, log_dir, base_model_dir)
     57     tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
     58 
---> 59     classifier = model.create_binary_classifier(tft_output, hyperparams)
     60     if base_model_dir:
     61         try:

~/home/repositories/git/oonisim/python-programs/courses/mlops-with-vertex-ai/src/model_training/model.py in create_binary_classifier(tft_output, hyperparams)
     83         )
     84 
---> 85     return _create_binary_classifier(feature_vocab_sizes, hyperparams)

~/home/repositories/git/oonisim/python-programs/courses/mlops-with-vertex-ai/src/model_training/model.py in _create_binary_classifier(feature_vocab_sizes, hyperparams)
     62             pass
     63 
---> 64     joined = keras.layers.Concatenate(name="combines_inputs")(layers)
     65     feedforward_output = keras.Sequential(
     66         [

/opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

/opt/conda/lib/python3.7/site-packages/keras/layers/merge.py in build(self, input_shape)
    517       ranks = set(len(shape) for shape in shape_set)
    518       if len(ranks) != 1:
--> 519         raise ValueError(err_msg)
    520       # Get the only rank for the set.
    521       (rank,) = ranks

ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 2), (None, 4), (7,), (None, 3), (None, 1), (None, 1), (6,), (None, 3), (None, 3), (None, 1), (None, 10)]

Environment

tfx                                   1.8.0

Soluition

import tfx.v1

Add Hyperparameter tuning

Show how to submit a hyperparameter tuning job to Vertex Training in the experimentation notebook

Upgrade to TFX 1.4 and TensorFlow 2.7

module 'tfx.dsl.components' has no attribute 'common'

03-training-formalization.ipynb

schema_importer = tfx.dsl.components.common.importer.Importer(
    source_uri=RAW_SCHEMA_DIR,
    artifact_type=tfx.types.standard_artifacts.Schema,
    reimport=False
)

context.run(schema_importer)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_8454/2100786518.py in <module>
      1 #import tfx.v1
      2 
----> 3 schema_importer = tfx.dsl.components.common.importer.Importer(
      4 #schema_importer = tfx.v1.dsl.Importer(
      5     source_uri=RAW_SCHEMA_DIR,

AttributeError: module 'tfx.dsl.components' has no attribute 'common'

Environment

tfx                                   1.8.0

Soluition

import tfx.v1

Model Versioning using Vertex AI Model

How do I register the model and load the different versions of the model using Vertex AI model.

For instance, I have trained scikit-learn model and I want to register it as v1(automatically as v1). After sometime, I have retrained my model with new features and I want to register my model with new version(automatically as v2).

Now I want to load my model like aiplatform.Model.load_model("model_name", "model_version") (not the current feature)

How Can I do this using Vertex AI Models

Getting started instructions fail at "sudo apt-get install google-cloud-sdk"

Step 6 in the getting started instructions fails due to google-cloud-sdk trying to overwrite a LICENSE file installed by google-cloud-cli. This seems like it probably isn't your problem, I think you'll have to go chase the SDK folks to get it fixed.

$ sudo apt-get install google-cloud-sdk
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  google-cloud-sdk-app-engine-java google-cloud-sdk-app-engine-python google-cloud-sdk-pubsub-emulator google-cloud-sdk-bigtable-emulator google-cloud-sdk-datastore-emulator
The following NEW packages will be installed:
  google-cloud-sdk
0 upgraded, 1 newly installed, 0 to remove and 21 not upgraded.
Need to get 0 B/153 MB of archives.
After this operation, 755 MB of additional disk space will be used.
(Reading database ... 138034 files and directories currently installed.)
Preparing to unpack .../google-cloud-sdk_444.0.0-0_all.deb ...
Unpacking google-cloud-sdk (444.0.0-0) ...
dpkg: error processing archive /var/cache/apt/archives/google-cloud-sdk_444.0.0-0_all.deb (--unpack):
 trying to overwrite '/usr/share/google-cloud-sdk/LICENSE', which is also in package google-cloud-cli 438.0.0-0
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/google-cloud-sdk_444.0.0-0_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

I tried apt-get update and apt-get upgrade which got me to cli version 444, but that had the same problem:

$ sudo apt-get install google-cloud-sdk
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  google-cloud-sdk-app-engine-java google-cloud-sdk-app-engine-python google-cloud-sdk-pubsub-emulator google-cloud-sdk-bigtable-emulator google-cloud-sdk-datastore-emulator
The following NEW packages will be installed:
  google-cloud-sdk
0 upgraded, 1 newly installed, 0 to remove and 1 not upgraded.
Need to get 0 B/153 MB of archives.
After this operation, 755 MB of additional disk space will be used.
(Reading database ... 138797 files and directories currently installed.)
Preparing to unpack .../google-cloud-sdk_444.0.0-0_all.deb ...
Unpacking google-cloud-sdk (444.0.0-0) ...
dpkg: error processing archive /var/cache/apt/archives/google-cloud-sdk_444.0.0-0_all.deb (--unpack):
 trying to overwrite '/usr/share/google-cloud-sdk/LICENSE', which is also in package google-cloud-cli 444.0.0-0
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/google-cloud-sdk_444.0.0-0_all.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

In Vertex AI creating Tensorboard instance will drain out all the credits

I've noticed that the TensorBoard instance that is being produced in Notebook 2 uses a lot of credits. Before January 22 this was free, once enabled this instance , disabling the Vertex AI API won't stop the billing.
See response from GCP - https://www.googlecloudcommunity.com/gc/AI-ML/How-can-I-avoid-being-charged-for-Tensorboard/td-p/180658

DataTransformer ModuleNotFoundError: No module named 'user_module_0' error

ModuleNotFoundError: No module named 'user_module_0' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 649, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 179, in execute op.start() File "apache_beam/runners/worker/operations.py", line 710, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.start File "apache_beam/runners/worker/operations.py", line 311, in apache_beam.runners.worker.operations.Operation.start File "apache_beam/runners/worker/operations.py", line 317, in apache_beam.runners.worker.operations.Operation.start File "apache_beam/runners/worker/operations.py", line 659, in apache_beam.runners.worker.operations.DoOperation.setup File "apache_beam/runners/worker/operations.py", line 660, in apache_beam.runners.worker.operations.DoOperation.setup File "apache_beam/runners/worker/operations.py", line 292, in apache_beam.runners.worker.operations.Operation.setup File "apache_beam/runners/worker/operations.py", line 306, in apache_beam.runners.worker.operations.Operation.setup File "apache_beam/runners/worker/operations.py", line 799, in apache_beam.runners.worker.operations.DoOperation._get_runtime_performance_hints File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 294, in loads return dill.loads(s) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 275, in loads return load(file, ignore, **kwds) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 270, in load return Unpickler(file, ignore=ignore, **kwds).load() File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load obj = StockUnpickler.load(self) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 826, in _import_module return __import__(import_name) ModuleNotFoundError: No module named 'user_module_0'

Running the transform component of this repository in Vertex AI hits the following error (in the 04-pipeline-deployment.ipynb notebook). Does anyone have a quick fix for this? Have tried specifying setup.py and "save_main_session": True so far with no luck.

Unit test for pipeline in Vertex AI

After reading the notebook: https://github.com/GoogleCloudPlatform/mlops-with-vertex-ai/blob/main/04-pipeline-deployment.ipynb

the unit test is testing the functions which is not used to launch the pipeline. It seems that there are two sets of functions, one for unit test and one for createing pipeline. Is that possible to test the functions defined under @component decorators?

Many thanks,

Error: 02 - ML Experimentation with Custom Model

Hi I'm running the tutorial with TFX 1.4.0 and TF 2.7.0.

When I run the cell
classifier = model.create_binary_classifier(tft_output, hyperparams) classifier.summary()

I get the error:

ValueError Traceback (most recent call last)
in
----> 1 classifier = model.create_binary_classifier(tft_output, hyperparams)
2 classifier.summary()

~/mlops-with-vertex-ai/src/model_training/model.py in create_binary_classifier(tft_output, hyperparams)
83 )
84
---> 85 return _create_binary_classifier(feature_vocab_sizes, hyperparams)

~/mlops-with-vertex-ai/src/model_training/model.py in _create_binary_classifier(feature_vocab_sizes, hyperparams)
62 pass
63
---> 64 joined = keras.layers.Concatenate(name="combines_inputs")(layers)
65 feedforward_output = keras.Sequential(
66 [

/opt/conda/lib/python3.7/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.traceback)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb

/opt/conda/lib/python3.7/site-packages/keras/layers/merge.py in build(self, input_shape)
514 ranks = set(len(shape) for shape in shape_set)
515 if len(ranks) != 1:
--> 516 raise ValueError(err_msg)
517 # Get the only rank for the set.
518 (rank,) = ranks

ValueError: A Concatenate layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 2), (None, 4), (7,), (None, 3), (None, 1), (None, 1), (6,), (None, 3), (None, 3), (None, 1), (None, 10)]

Explainability doesn't work

Fixing how XAI should work,

Python markupsafe dependency error in 03-training-formalization.ipynb

The latest Python 3 package markupsafe is not compatible with from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext:

import ml_metadata as mlmd
from ml_metadata.proto import metadata_store_pb2
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

Error:

ImportError Traceback (most recent call last)
/tmp/ipykernel_1/3645963042.py in
1 import ml_metadata as mlmd
2 from ml_metadata.proto import metadata_store_pb2
----> 3 from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
4
5 connection_config = metadata_store_pb2.ConnectionConfig()

~/.local/lib/python3.7/site-packages/tfx/orchestration/experimental/interactive/interactive_context.py in
35
36 import absl
---> 37 import jinja2
38 import nbformat
39 from tfx import types

~/.local/lib/python3.7/site-packages/jinja2/init.py in
10 from .bccache import FileSystemBytecodeCache
11 from .bccache import MemcachedBytecodeCache
---> 12 from .environment import Environment
13 from .environment import Template
14 from .exceptions import TemplateAssertionError

~/.local/lib/python3.7/site-packages/jinja2/environment.py in
23 from .compiler import CodeGenerator
24 from .compiler import generate
---> 25 from .defaults import BLOCK_END_STRING
26 from .defaults import BLOCK_START_STRING
27 from .defaults import COMMENT_END_STRING

~/.local/lib/python3.7/site-packages/jinja2/defaults.py in
1 # -- coding: utf-8 --
2 from ._compat import range_type
----> 3 from .filters import FILTERS as DEFAULT_FILTERS # noqa: F401
4 from .tests import TESTS as DEFAULT_TESTS # noqa: F401
5 from .utils import Cycler

~/.local/lib/python3.7/site-packages/jinja2/filters.py in
11 from markupsafe import escape
12 from markupsafe import Markup
---> 13 from markupsafe import soft_unicode
14
15 from ._compat import abc

ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/jupyter/.local/lib/python3.7/site-packages/markupsafe/init.py)

The workaround is to install an older version by pip install markupsafe==2.0.1 and restart kernel

01-dataset-management.ipynb fails at "Create Vertex Dataset resource" with "no attribute 'SUPPORTED_REGIONS'"

In 01-dataset-management.ipynb at "Create Vertex Dataset resource" I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/tmp/ipykernel_23085/3064911949.py in <module>
      1 vertex_ai.init(
      2     project=PROJECT,
----> 3     location=REGION
      4 )

/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/initializer.py in init(self, project, location, experiment, experiment_description, staging_bucket, credentials, encryption_spec_key_name)
     97             self._project = project
     98         if location:
---> 99             utils.validate_region(location)
    100             self._location = location
    101         if staging_bucket:

/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/utils/__init__.py in validate_region(region)
    272 
    273     region = region.lower()
--> 274     if region not in constants.SUPPORTED_REGIONS:
    275         raise ValueError(
    276             f"Unsupported region for Vertex AI, select from {constants.SUPPORTED_REGIONS}"

AttributeError: module 'google.cloud.aiplatform.constants' has no attribute 'SUPPORTED_REGIONS'

It's mentioned here: googleapis/python-aiplatform#1106

Sounds like it used to be in constants.SUPPORTED_REGIONS and now it's in constants.base.SUPPORTED_REGION.

A doubt regarding the TFX pipeline associated with Continuous Training

Hi @ksalama.

Thank you very much for this amazing resource. It's a mini-book in itself.

I am referring to the statement written just below the continuous training section:

The end-to-end TFX training pipeline implementation is in the src/pipelines directory, which covers the following steps:

Are these pipelines demonstrated in any of the notebooks?

google.api_core.exceptions.PermissionDenied: 403 Permission denied on resource project {PROJECT}.

hi, i followed your tutorials and everything was fine until Notebook 6, When I tried to Run create endpoint section.

!python build/utils.py \ --mode=create-endpoint\ --project={PROJECT}\ --region=us-central1\ --endpoint-display-name={ENDPOINT_DISPLAY_NAME}
I had this error:
google.api_core.exceptions.PermissionDenied: 403 Permission denied on resource project {PROJECT}.
Any suggestions?

Version an already trained custom model on Model Registry

Hi, everyone!

I trained a model using scikit-learn, and saved it as a pickle file, stored in a GCS bucket. I was wondering... how can I version this model using model registry since it is already trained?

In my case, I have the option of using Cloud Run for this (a cloud run container that runs a retraining task every week), but I want to start doing it on Vertex AI.

After reading some articles about Vertex AI Model Registry, I concluded that the model must be firstly trained using a custom training job, and, just after that, we can begin its versioning on Model Registry. Is this correct?

Error while creating model

In notebook 02, cell:

classifier = model.create_binary_classifier(tft_output, hyperparams)
classifier.summary()

The following error appears:

Concatenate layer requires inputs with matching shapes except for the concatenation axis. Received: input_shape=[(None, 2), (None, 4), (7,), (None, 3), (None, 1), (None, 1), (6,), (None, 3), (None, 3), (None, 1), (None, 10)]

Python IndexError execution error in 03-training-formalization.ipynb

Python error encountered executing the following line at [Extract train and eval splits]:

sql_query = datasource_utils.get_training_source_query(

sql_query = datasource_utils.get_training_source_query(
    PROJECT, REGION, DATASET_DISPLAY_NAME, ml_use='UNASSIGNED', limit=5000)

Observed error:

IndexError Traceback (most recent call last)
/tmp/ipykernel_1/1584844956.py in
1 print(DATASET_DISPLAY_NAME)
2 sql_query = datasource_utils.get_training_source_query(
----> 3 PROJECT, REGION, DATASET_DISPLAY_NAME, ml_use='UNASSIGNED', limit=5000)
4
5 output_config = example_gen_pb2.Output(

~/mlops-with-vertex-ai/src/common/datasource_utils.py in get_training_source_query(project, region, dataset_display_name, ml_use, limit)
55 dataset = vertex_ai.TabularDataset.list(
56 filter=f"display_name={dataset_display_name}", order_by="update_time"
---> 57 )[-1]
58 bq_source_uri = dataset.gca_resource.metadata["inputConfig"]["bigquerySource"][
59 "uri"

IndexError: list index out of range

I can't find .list method for google.cloud.aiplatform's TabularDataset in datasource_utils.py

Give CloudBuild access to BQ and Vertex AI

Update the Terraform scripts to give the CloudBuild service account access to BigQuery and Vertex AI.
This is required for the execution of the CI/CD step that executes and e2e pipeline test and deploys the model to Vertex AI.

Missing gcloud command or terraform code to create cloud build triggers

Users like to have a way to create cloud build triggers in the Terraform folder or with gcloud commands. I found tensorflow transform is using Apache beam runner in the following cloud build file.

# Compile the pipeline.
- name: '$_CICD_IMAGE_URI'
  entrypoint: 'python'
  args: ['build/utils.py',
          '--mode', 'compile-pipeline',
          '--pipeline-name', '$_PIPELINE_NAME'
          ]
  dir: 'mlops-with-vertex-ai'
  env: 
  - 'PROJECT=$_PROJECT'  
  - 'REGION=$_REGION'
  - 'MODEL_DISPLAY_NAME=$_MODEL_DISPLAY_NAME'
  - 'DATASET_DISPLAY_NAME=$_DATASET_DISPLAY_NAME'  
  - 'GCS_LOCATION=$_GCS_LOCATION' 
  - 'TFX_IMAGE_URI=$_TFX_IMAGE_URI' 
  - 'BEAM_RUNNER=$_BEAM_RUNNER'
  - 'TRAINING_RUNNER=$_TRAINING_RUNNER'
  id: 'Compile Pipeline'
  waitFor: ['Local Test E2E Pipeline']

So far I gather an example from a notebook's section. But I can't find how to create the cloud build trigger.

Run the training pipeline using Vertex Pipelines
Set the pipeline configurations for the Vertex AI run
os.environ["DATASET_DISPLAY_NAME"] = DATASET_DISPLAY_NAME
os.environ["MODEL_DISPLAY_NAME"] = MODEL_DISPLAY_NAME
os.environ["PIPELINE_NAME"] = PIPELINE_NAME
os.environ["PROJECT"] = PROJECT
os.environ["REGION"] = REGION
os.environ["GCS_LOCATION"] = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}"
os.environ["TRAIN_LIMIT"] = "85000"
os.environ["TEST_LIMIT"] = "15000"
os.environ["BEAM_RUNNER"] = "DataflowRunner"
os.environ["TRAINING_RUNNER"] = "vertex"
os.environ["TFX_IMAGE_URI"] = f"gcr.io/{PROJECT}/{DATASET_DISPLAY_NAME}:{VERSION}"
os.environ["ENABLE_CACHE"] = "1"

Error during compilation with Cloud Build

During the training pipeline compilation phase with Cloud Build I am facing a weird error.

Here's the entire compilation log.
log-41d61716-bd25-434c-ae3c-745540d398b8.log.zip

Error: googleapi: Error 409: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again a different name and try again.

I literally created a bucket with a random BTC wallet address to make it unique, which I was told to do by the GCP guide, and still i am hitting the same error while executing gcs-bucket.tf

google_storage_bucket.bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh: Creating... ╷ │
Error: googleapi: Error 409: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again., conflict │ │ with google_storage_bucket.bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh, │ on gcs-bucket.tf line 17, in resource "google_storage_bucket" "bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh": │ 17: resource "google_storage_bucket" "bc1qxy2kgdygjrsqtzq2n0yrf2493p83kkfjhx0wlh" { │

googlecloudplatform / mlops-with-vertex-ai Goto Github PK

mlops-with-vertex-ai's People

Contributors

Stargazers

Watchers

Forkers

mlops-with-vertex-ai's Issues

Problem

Environment

Soluition

Environment

Soluition

Error:

ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/jupyter/.local/lib/python3.7/site-packages/markupsafe/init.py)

Observed error:

IndexError: list index out of range

Recommend Projects

Recommend Topics

Recommend Org