Official Repo for Google Cloud AI Platform. Find samples for Vertex AI, Google Cloud's new unified ML platform at: https://github.com/GoogleCloudPlatform/vertex-ai-samples

Home Page: https://cloud.google.com/ai-platform/docs

License: Apache License 2.0

Shell 0.18% Python 0.49% Jupyter Notebook 99.29% Dockerfile 0.03% HCL 0.01%

samples

ai-platform-samples's Introduction

🔔 Please visit vertex-ai-samples for sample code for Vertex AI.

Vertex AI is our next generation AI Platform, with many new features that are unavailable in the current platform. Migrate your resources to Vertex AI to get the latest machine learning features, simplify end-to-end journeys, and productionize models with MLOps.

Google Cloud AI Platform Products

Welcome to the AI Platform sample code repository. This repository contains samples for how to use AI Platform products.

Overview

The repository is organized by products:

AI Platform Training
- Horovod
- PyTorch
- scikit-learn
- TensorFlow
- XGBoost
AI Platform Prediction
- scikit-learn
- TensorFlow
- XGBoost
- Tools AI Platform Prediction tools
AI Platform Optimizer
AI Platform Pipelines
AI Platform Notebooks
- Samples
- Templates Templates used to contribute to AI Platform samples
- Tools AI Platform Notebooks tools
AI Hub

Getting Started

We highly recommend that you start with our Quick Start Sample.

Navigating this Repository

This repository is organized based on the available products on AI Platform, and the typical Machine Learning problems that developers are trying to solve. For instance, if you are trying to train a model with scikit-learn, you will find the sample under training/sklearn/structured/base directory. AI Platform also supports xgboost, TensorFlow, and PyTorch.

Please refer to the README.md file in each sample directory for more specific instructions.

Google Machine Learning Repositories

If you’re looking for our guides on how to do Machine Learning on Google Cloud Platform (GCP) using other services, please checkout our other repositories:

ML on GCP, which has guides on how to bring your code from various ML frameworks to Google Cloud Platform using things like Google Compute Engine or Kubernetes.
Keras Idiomatic Programmer This repository contains content produced by Google Cloud AI Developer Relations for machine learning and artificial intelligence. The content covers a wide spectrum from educational, training, and research, covering from novices, junior/intermediate to advanced.
Professional Services, common solutions and tools developed by Google Cloud's Professional Services team.

Contributing a notebook

Only Googlers may contribute to this repo. If you are a Googler, please see go/cloudai-notebook-workflow for instructions.

Troubleshooting

For common issues and solutions, please check our troubleshooting page.

Getting help

Please use the issues page to provide feedback or submit a bug report.

Disclaimer

This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.

The content in the AI-Platform-Samples repository is not officially maintained by Google.

ai-platform-samples's People

Contributors

Stargazers

Watchers

Forkers

krishna999 dizcology pratikfalke chriskaspar b0noi izadirridazi prodonjs jitkasempin pjpuneet lukmanr farik sivakumar1604 jkwong888 gdbsd anhle fchouteau katendencies madhukarpatneedi sshrdp tejal29 thompson42 david-licause htappen carlthome denisogr gogasca vchandvankar post2web sirtorry padominguez arc-jung changlan gtrunsec jackpopp125 kevinselhi peopzen ultrons posypanka mura5726 hmj aarondietz234 dmkinney yufengg chevalpartners harishmalan fodrh1201 genebflores nicolizamacorrea fjp param17 ksalama romeritomorais chongyouquan venugopalai thedriftofwords rpasricha keriabermudez keriber ai-hub-deep-learning-fundamental vuppalli longtle verakai kadjulio ricardo-graciano br1970 franck-dernoncourt olepno brianchunkang rakesh283343 tmatsuo leahecole ivanmkc agniji jsgalpin-umbc dlminvestments ladangol shiftleftco fatehbenazir muskanmahajan37 heungseok leguepar bookendus jlam-cubert dwangarc belapyc nikhilsanghi remylouisew sandeshprabhu02 kweinmeister kaistha23 djouani datastark jcmleon22 aaronnbrock tammy228 fauzannu nicholasmcelroy cbunraj halio-g aribray

ai-platform-samples's Issues

Repo configuration

Configure PR and contributions similar as we do with Cloud ML samples.

Bug Template
GitHub contribution settings
Owners group

text_classification_using_pytorch_and_ai_platform.ipynb Notebook error

nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
import pandas as pd
import numpy as np
from preprocess import TextPreprocessor

def load_data(train_data_path, eval_data_path):
    # Parse CSV using pandas
    column_names = ('label', 'text')
    
    df_train = pd.read_csv(train_data_path, names=column_names, sep='\t')
    df_train = df_train.sample(frac=1)
    
    df_eval = pd.read_csv(eval_data_path, names=column_names, sep='\t')

    return ((list(df_train['text']), np.array(df_train['label'].map(CLASSES))),
            (list(df_eval['text']), np.array(df_eval['label'].map(CLASSES))))


((train_texts, train_labels), (eval_texts, eval_labels)) = load_data(
       'train.tsv', 'eval.tsv')

# Create vocabulary from training corpus.
processor = TextPreprocessor(VOCAB_SIZE, MAX_SEQUENCE_LENGTH)
processor.fit(train_texts)

# Preprocess the data
train_texts_vectorized = processor.transform(train_texts)
eval_texts_vectorized = processor.transform(eval_texts)
------------------

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/IPython/extensions/autoreload.py in post_execute_hook(self)
    536         newly_loaded_modules = set(sys.modules) - self.loaded_modules
    537         for modname in newly_loaded_modules:
--> 538             _, pymtime = self._reloader.filename_and_mtime(sys.modules[modname])
    539             if pymtime is not None:
    540                 self._reloader.modules_mtimes[modname] = pymtime

/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/IPython/extensions/autoreload.py in filename_and_mtime(self, module)
    182 
    183     def filename_and_mtime(self, module):
--> 184         if not hasattr(module, '__file__') or module.__file__ is None:
    185             return None, None
    186 

/usr/local/lib/python3.5/dist-packages/py/_vendored_packages/apipkg.py in __getattribute__(self, name)
    193         def __getattribute__(self, name):
    194             try:
--> 195                 return getattr(getmod(), name)
    196             except ImportError:
    197                 return None

/usr/local/lib/python3.5/dist-packages/py/_vendored_packages/apipkg.py in getmod()
    177     def getmod():
    178         if not mod:
--> 179             x = importobj(modpath, None)
    180             if attrname is not None:
    181                 x = getattr(x, attrname)

/usr/local/lib/python3.5/dist-packages/py/_vendored_packages/apipkg.py in importobj(modpath, attrname)
     67 
     68 def importobj(modpath, attrname):
---> 69     module = __import__(modpath, None, None, ['__doc__'])
     70     if not attrname:
     71         return module

/usr/local/lib/python3.5/dist-packages/pytest.py in <module>
     11     hookspec, hookimpl
     12 )
---> 13 from _pytest.fixtures import fixture, yield_fixture
     14 from _pytest.assertion import register_assert_rewrite
     15 from _pytest.freeze_support import freeze_includes

/usr/local/lib/python3.5/dist-packages/_pytest/fixtures.py in <module>
    843 
    844 @attr.s(frozen=True)
--> 845 class FixtureFunctionMarker(object):
    846     scope = attr.ib()
    847     params = attr.ib(convert=attr.converters.optional(tuple))

/usr/local/lib/python3.5/dist-packages/_pytest/fixtures.py in FixtureFunctionMarker()
    845 class FixtureFunctionMarker(object):
    846     scope = attr.ib()
--> 847     params = attr.ib(convert=attr.converters.optional(tuple))
    848     autouse = attr.ib(default=False)
    849     ids = attr.ib(default=None, convert=_ensure_immutable_ids)

TypeError: attrib() got an unexpected keyword argument 'convert'
TypeError: attrib() got an unexpected keyword argument 'convert'

notebooks/samples/pytorch/text_classification/text_classification_using_pytorch_and_ai_platform.ipynb

Training | Standard | Base | TensorFlow

Product: AI Platform
Feature: Standard Training
Framework: TensorFlow

gcloud ai-platform local predict requires tensorflow even when model was trained with scikit learn

Describe the bug

Trying to run gcloud ai-platform local predict with a model trained in scikit learn but it throws an error because it did not find Tensorflow. Tensorflow is not in my env given that I am only using scikit learn. If I go to the prediction_utils.py and remove line 29 then things work.

Expected behavior
code should run without Tensorflow being required given that framework is scikit learn

Full error here:

If the signature defined in the model is not serving_default then you must specify it via --signature-name flag, otherwise the command may fail.
ERROR: (gcloud.ai-platform.local.predict) b'Traceback (most recent call last):\n File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 184, in \n main()\n File "/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/local_predict.py", line 167, in main\n from cloud.ml.prediction import prediction_lib\n File "/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/init.py", line 20, in \n from .custom_code_utils import create_user_model\n File "/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/custom_code_utils.py", line 22, in \n from .prediction_utils import PredictionError\n File "/google-cloud-sdk/lib/third_party/ml_sdk/cloud/ml/prediction/prediction_utils.py", line 29, in \n from tensorflow.python.framework import dtypes # pylint: disable=g-direct-tensorflow-import\nModuleNotFoundError: No module named 'tensorflow'\n'

To Reproduce

Create env with scikit learn that does not have tensorflow
Fit a model with scikit learn and save it locally, simple example:

from sklearn.linear_model import LogisticRegression
import numpy as np
model=LogisticRegression()
x=np.array([1,1,1,1,1,100,100,100,100,100]).reshape(-1,1)
y=['a','a','a','a','a','b','b','b','b','b']
model.fit(x,y)
joblib.dump(model,"path/model.joblib")

Save a json file called input_logistic.json to test prediction, example:

[1]
[100]

Run gcloud ai-platform local predict

gcloud ai-platform local predict --model-dir path/ \
  --json-instances path/input_logistic.json \
  --framework scikit-learn

System Information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.14.6
Framework and version (Tensorflow, scikit-learn, XGBoost): scikit-learn 0.23.1
Python version: 3.8.3
Google Cloud SDK:312.0.0
Exact command to reproduce:

gcloud ai-platform local predict --model-dir path/ \
  --json-instances path/input_logistic.json \
  --framework scikit-learn

errors when using gsutils

This tends to come up when using gsutil to copy large files into my workspace.

!gsutil  -m   cp   gs://my_bucket/fastq_from_bam/fastq_*    data

Errors include a 'file has changed, want to overwrite'?
Save errors, and a 504 connection error ...
The notebook continues to run however.

text_classification_using_pytorch_and_ai_platform.ipynb error

`pip install -r requirements.txt' shows an error, after another `python setup.py install`

1- Created a new clean virtual env with python3.
2- Went to quickstart sample and ran python setup.py install
3- Went to sklearn base sample and ran pip install -r requirements.txt and got that error again.
4- in sklearn base sample, this time I ran python setup.py install and it worked fine.

It may be a good idea to just choose either setup.py, or requirements.txt and remove the other.

Nova JupyterLab Extension cannot be installed

Could not install this repository on GCP Notebooks.

I get the following error:

jupyter@tensorflow-2-2-20200623-124019:~/ai-platform-samples/notebooks/tools/nova-jupyterlab-extensions$ sudo /opt/conda/bin/jupyter labextension install

An error occured.
ValueError: Please install Node.js and npm before continuing installation. You may be able to install Node.js from your package manager, from conda, or directly from the Node.js website (https://nodejs.org).

Manual installation of Node.js doesn't work either due to version incompatibility issues.

Reproduction steps:

start AI Platform Notebook
Switch to "compute engine / VM instance view"
SSH to respective instance
git clone https://github.com/GoogleCloudPlatform/ai-platform-samples.git
cd notebooks/tools/nova-jupyterlab-extensions

from here we follow install instructions from the repo###

sudo pip3 install .
sudo service jupyter restart
8a. sudo jupyter labextension install (-> "jupyter not found", hence 8b)
8b. sudo /opt/conda/bin/jupyter labextension install

Alternatives that I tried:

Install the repo from the Notebook itself (not SSHing to it but Opening JupyterLab from the Notebooks page).
Upgrade JupyterLab to newest version (2.1.5).
Using Deep Learning VM image with Tensorflow 1.15.
Using TensorFlow:2.2 notebook instance.

None of the above worked around the issue.

TF Cloud requires -u parameter on AI Platform

In an AI Platform Notebook, the command required to install TF cloud is:
pip install -U tensorflow-cloud

Without the -U, the installation will fail.

Data Analysis on Cash payments seems confusing.

This description in the README for Datasets I found a distraction and confusing:

Dataset Analysis
The goal is to train a Binary Classification model that predicts whether a person leaves 20% tips or more (target label) based on the taxi ride information.

We did some analysis of the dataset and realized that over 50% of the payment types are Cash. We also noticed that the majority of cash payments don't have any tips. We believe this is because the tips for cash payments have not been properly recorded, and therefore, the dataset is somewhat incomplete for cash payments.

This will naturally have an impact on any trained model. The model accuracy for non-cash payments will be a bit lower than the general accuracy. On the other hand, any prediction of the model for cash payments is not as reliable as the other payment types.

Recommendation

I recommend either:

Just say the majority of tips for cash payment were not recorded and make no statement on general accuracy; other than it will be biased towards predicting the tip < 20%, or
Make a 3rd dataset that has no cash payments, or
Synthesize values for the cash tips from the non-cash distribution.

text_classification_using_pytorch_and_ai_platform

ai-explanations-tabular.ipynb error

Linting errors location

Is your feature request related to a problem? Please describe.
When linting AI Platform notebook files, the linter shows line numbers for places with issues. However, there are no line numbers present in AI Platform notebooks which makes debugging a little more difficult. Especially for misplaced parenthesis and brackets on their own line, or empty lines with whitespaces.

Describe the solution you'd like
It would be more helpful to also indicate which code block the error is part of and the line number within the code block for more clarity.

Additional context
Add any other context or screenshots about the feature request here.

ai_platform_optimizer_multi_objective.ipynb Notebook failed

+ execute notebooks/samples/optimizer/ai_platform_optimizer_multi_objective.ipynb
+ QUIET_FULL='pip install --quiet'
+ QUIET_SHORT='pip install -q'
+ grep -q 'pip install --quiet' notebooks/samples/optimizer/ai_platform_optimizer_multi_objective.ipynb
+ grep -q 'pip install -q' notebooks/samples/optimizer/ai_platform_optimizer_multi_objective.ipynb
+ sed -i 's/pip *install/pip install -q/g' notebooks/samples/optimizer/ai_platform_optimizer_multi_objective.ipynb
+ mkdir /tmpfs/src/temp
+ cp --parents notebooks/samples/optimizer/ai_platform_optimizer_multi_objective.ipynb /tmpfs/src/temp
+ python /tmpfs/src/gfile/executor.py --input_notebook=/tmpfs/src/temp/notebooks/samples/optimizer/ai_platform_optimizer_multi_objective.ipynb --timeout=15000
I0409 16:04:53.939353 139957312333568 execute.py:404] Executing notebook with kernel: python3
E0409 16:05:02.088410 139957312333568 execute.py:509] Kernel died while waiting for execute reply.
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 478, in _poll_for_reply
    msg = self.kc.shell_channel.get_msg(timeout=timeout)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/jupyter_client/blocking/channels.py", line 57, in get_msg
    raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmpfs/src/gfile/executor.py", line 115, in <module>
    app.run(main)
  File "/home/kbuilder/.local/lib/python3.5/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/kbuilder/.local/lib/python3.5/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/tmpfs/src/gfile/executor.py", line 110, in main
    timeout=FLAGS.timeout)
  File "/tmpfs/src/gfile/executor.py", line 92, in execute
    'path': str(pathlib.Path(input_notebook).parent)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 408, in preprocess
    self.set_widgets_metadata()
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 348, in setup_preprocessor
    yield nb, self.km, self.kc
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/tmpfs/src/gfile/executor.py", line 52, in preprocess_cell
    return super().preprocess_cell(cell, resources, cell_index, **kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 438, in preprocess_cell
    reply, outputs = self.run_cell(cell, cell_index, store_history)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 578, in run_cell
    exec_reply = self._poll_for_reply(parent_msg_id, cell, timeout)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 483, in _poll_for_reply
    self._check_alive()
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 510, in _check_alive
    raise DeadKernelError("Kernel died")
nbconvert.preprocessors.execute.DeadKernelError: Kernel died

Tensorflow 2 Examples

This repository does not contain any tensorflow 2 examples. Since TF2 seems to be a supported product on GCP, it would be great if at least the quickstart guide could be updated.

covertype_training_serving_logging_bq.ipynb error

Incomplete instructions in the Prerequisites section

The last item in the Prerequisites section in this README is incomplete. There should be something after "Change the directory to this sample and run"

"OPEN JUPYTERLAB" does not show

I am deriving a custom container from a non-google source. I am successfully uploading it. I have two versions of the issue:

if I don't have "EXPOSE 8080" in the dockerfile, then I am seeing "CONNECT" rather than "OPEN JUPYTERLAB" and when I click it I am seeing and error message indicating I need to expose 8080
When I add ""EXPOSE 8080" in the docerfile, then "OPEN JUPYTERLAB" shows up. But when I click it I get 504 error.

How I can resolve this issue?

Hyperparameter tuning using Cloud ML hypertune for TF keras

Use cloudml-hypertune with TF Keras

sklearn/structured/base/predict.py predicts all zero's

Describe the bug

sklearn/structured/base/predict.py predicts all zero's

see here:

python ./prediction/predict.py
[0, 0, 0]

What sample is this bug related to?

i-platform-samples/prediction/sklearn/structured/base/prediction/predict.py

Source code / logs

I did a minimal amount of changes to make it run at all:

--- a/prediction/sklearn/structured/base/prediction/predict.py
+++ b/prediction/sklearn/structured/base/prediction/predict.py
-service = googleapiclient.discovery.build('ml', 'v1')
+service = googleapiclient.discovery.build('ml', 'v1', cache_discovery=False)

To Reproduce

Just follow guide in proposed order.

https://github.com/GoogleCloudPlatform/ai-platform-samples/tree/master/prediction/sklearn/structured/base

Expected behavior

Other output data than [0,0,0] for any input data.

System Information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

It's all run on gcloud.

Framework and version (Tensorflow, scikit-learn, XGBoost):

scikit-learn
0.20.4
1.15

Python version:

3.7

Exact command to reproduce:

python ./prediction/predict.py
[0, 0, 0]

Pass arguments in config file

Configure YAML file to pass arguments.
https://stackoverflow.com/questions/57374601/how-to-add-user-specific-arguments-in-the-config-yaml-file-on-google-cloud-ml-en

AI platform notebooks , R, running cloudml_train() cannot find python

I ran a job that had already run on RStudio a few months ago, on AI platform R notebook.

When I ran cloudml_train("training.R")

I got the following error:

[errmsg]
ERROR: (gcloud.ai-platform.jobs.submit.training) INVALID_ARGUMENT: Field: runtime_version Error: The specified runtime version '1.9' with the Python version '' is not supported or is deprecated. Please specify a different runtime version. See https://cloud.google.com/ml-engine/docs/runtime-version-list for a list of supported versions

'@type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: The specified runtime version '1.9' with the Python version '' is
  not supported or is deprecated. Please specify a different runtime version.
  See https://cloud.google.com/ml-engine/docs/runtime-version-list for a list
  of supported versions
  field: runtime_version

TensorFlow SavedModel with input/output to google cloud storage

Can we have an example of how to use google cloud storage for input/output of a TensorFlow saved model? This is something we can do with a custom predictor (which only runs in legacy machines) but we are having a hard time migrating to a SavedModel because we can't find how to make the SavedModel read the input from a google cloud storage, and more importantly to write back the output.

The model works with images, and pass/receive the payload base64 through the predict API is not working, because it is quite large.

Update GPU version to use T4 in TensorFlow sample

Migrate TF Keras Notebook from 1.15 to 2.x

Migrate and test notebook from 1.15 to 2.x
https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/tensorflow/keras/getting_started_keras.ipynb

This notebook has older references for old repo, should be updated accordingly

inconsistency of the instances run in ai platform

Here is the situation:
I push a custom container to gcr and create a notebook on ai platform. Everything I have in the dockerfile works fine. Then I stop the instance. Next day, I start it again, in this case, nvdashboard stops working and when I check out the developer tools, I am seeing a 500 error message. Somehow, on my runs after the 1st one npm or nodejs crashes.

training_an_xgboost_model_with_ai_hub.ipynb Notebook error

# Import data from Kaggle
# For documentation on using the Kaggle API for Python refer to the official repo: https://github.com/Kaggle/kaggle-api
!~/.local/bin/kaggle competitions download -c house-prices-advanced-regression-techniques

# Unzip the training and test datasets
with zipfile.ZipFile('house-prices-advanced-regression-techniques.zip', 'r') as data_zip:
    data_zip.extractall('data')
# Remove the downloaded compressed file
tf.io.gfile.remove('house-prices-advanced-regression-techniques.zip')
------------------

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-9-89bb9eac3073> in <module>
      4 
      5 # Unzip the training and test datasets
----> 6 with zipfile.ZipFile('house-prices-advanced-regression-techniques.zip', 'r') as data_zip:
      7     data_zip.extractall('data')
      8 # Remove the downloaded compressed file

/usr/lib/python3.5/zipfile.py in __init__(self, file, mode, compression, allowZip64)
   1007             while True:
   1008                 try:
-> 1009                     self.fp = io.open(file, filemode)
   1010                 except OSError:
   1011                     if filemode in modeDict:

FileNotFoundError: [Errno 2] No such file or directory: 'house-prices-advanced-regression-techniques.zip'
FileNotFoundError: [Errno 2] No such file or directory: 'house-prices-advanced-regression-techniques.zip'

Launching jupyter lab using custom container

Describe the bug
A clear and concise description of what the bug is. Be sure to convey here whether it occurred locally or on the server (AI Platform, Google Dataflow)

I am trying to run an ai-platform notebook using a custom container. I would like to run jupyter lab inside this custom container. Do I need to specify an ENTRYPOINT in my Dockerfile like below to run jupyter lab?

ENTRYPOINT ["jupyter lab", "--ip=0.0.0.0", "--allow-root", "--no-browser"]

I have exposed port 8080 and 8888 in my Dockerfile.
Not sure, if this the right place to ask this question. Please let me know if I should post this somewhere else.

Add sklearn Test case

ai_platform_sentiment_analysis.ipynb Notebook error

nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
import tensorflow as tf

print(tf.__version__)
tf.logging.set_verbosity(tf.logging.INFO)
------------------

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-6e70ca0b64fc> in <module>
      2 
      3 print(tf.__version__)
----> 4 tf.logging.set_verbosity(tf.logging.INFO)

AttributeError: module 'tensorflow' has no attribute 'logging'
AttributeError: module 'tensorflow' has no attribute 'logging'

Serving | Standard | Base | TensorFlow

Product: AI Platform
Feature: Standard Serving
Framework: TensorFlow

ai-platform-samples/training/tensorflow/structured/base/trainer/experiment.py doesn't work for distributed training

Describe the bug
All the workers will call export_eval_savedmodel resulting in the following error :

tensorflow.python.framework.errors_impl.AlreadyExistsError: file already exists

What sample is this bug related to?
training/tensorflow/structured/base/trainer/distributed

To Reproduce
Just run the distributed sample above.

Expected behavior
the export should only be done by master node.

ai_platform_optimizer_tuner.ipynb error

# enhance code readability in quickstart/prediction/predict.py

Increase code readability in lines 34 & 35 of predict.py

Currently, the lines are as follows:

name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME,
                                                 MODEL_VERSION)

Recommended changes:

name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID,
                                                  MODEL_NAME,
                                                  MODEL_VERSION)

or, name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME, MODEL_VERSION)

I would like to submit a PR for this issue as my first contribution.

serving_pytorch_models_in_ai_platform.ipynb Notebook error

Error when executing this notebook:
serving_pytorch_models_in_ai_platform.ipynb

This line is the error:

from urllib.request import urlretrieve
urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", LOCAL_DATA_DIR)

Error:

nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
from urllib.request import urlretrieve

urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", LOCAL_DATA_DIR)
------------------

---------------------------------------------------------------------------
SSLError                                  Traceback (most recent call last)
/usr/lib/python3.5/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1253             try:
-> 1254                 h.request(req.get_method(), req.selector, req.data, headers)
   1255             except OSError as err: # timeout error

/usr/lib/python3.5/http/client.py in request(self, method, url, body, headers)
   1105         """Send a complete request to the server."""
-> 1106         self._send_request(method, url, body, headers)
   1107 

/usr/lib/python3.5/http/client.py in _send_request(self, method, url, body, headers)
   1150             body = _encode(body, 'body')
-> 1151         self.endheaders(body)
   1152 

/usr/lib/python3.5/http/client.py in endheaders(self, message_body)
   1101             raise CannotSendHeader()
-> 1102         self._send_output(message_body)
   1103 

/usr/lib/python3.5/http/client.py in _send_output(self, message_body)
    933 
--> 934         self.send(msg)
    935         if message_body is not None:

/usr/lib/python3.5/http/client.py in send(self, data)
    876             if self.auto_open:
--> 877                 self.connect()
    878             else:

/usr/lib/python3.5/http/client.py in connect(self)
   1259             self.sock = self._context.wrap_socket(self.sock,
-> 1260                                                   server_hostname=server_hostname)
   1261             if not self._context.check_hostname and self._check_hostname:

/usr/lib/python3.5/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
    376                          server_hostname=server_hostname,
--> 377                          _context=self)
    378 

/usr/lib/python3.5/ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context)
    751                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 752                     self.do_handshake()
    753 

/usr/lib/python3.5/ssl.py in do_handshake(self, block)
    987                 self.settimeout(None)
--> 988             self._sslobj.do_handshake()
    989         finally:

/usr/lib/python3.5/ssl.py in do_handshake(self)
    632         """Start the SSL/TLS handshake."""
--> 633         self._sslobj.do_handshake()
    634         if self.context.check_hostname:

SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

During handling of the above exception, another exception occurred:

URLError                                  Traceback (most recent call last)
<ipython-input-8-7c9b008399c2> in <module>
      1 from urllib.request import urlretrieve
      2 
----> 3 urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", LOCAL_DATA_DIR)

/usr/lib/python3.5/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    186     url_type, path = splittype(url)
    187 
--> 188     with contextlib.closing(urlopen(url, data)) as fp:
    189         headers = fp.info()
    190 

/usr/lib/python3.5/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    161     else:
    162         opener = _opener
--> 163     return opener.open(url, data, timeout)
    164 
    165 def install_opener(opener):

/usr/lib/python3.5/urllib/request.py in open(self, fullurl, data, timeout)
    464             req = meth(req)
    465 
--> 466         response = self._open(req, data)
    467 
    468         # post-process response

/usr/lib/python3.5/urllib/request.py in _open(self, req, data)
    482         protocol = req.type
    483         result = self._call_chain(self.handle_open, protocol, protocol +
--> 484                                   '_open', req)
    485         if result:
    486             return result

/usr/lib/python3.5/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    442         for handler in handlers:
    443             func = getattr(handler, meth_name)
--> 444             result = func(*args)
    445             if result is not None:
    446                 return result

/usr/lib/python3.5/urllib/request.py in https_open(self, req)
   1295         def https_open(self, req):
   1296             return self.do_open(http.client.HTTPSConnection, req,
-> 1297                 context=self._context, check_hostname=self._check_hostname)
   1298 
   1299         https_request = AbstractHTTPHandler.do_request_

/usr/lib/python3.5/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
   1254                 h.request(req.get_method(), req.selector, req.data, headers)
   1255             except OSError as err: # timeout error
-> 1256                 raise URLError(err)
   1257             r = h.getresponse()
   1258         except:

URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)>
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)>

autoshutdown script unable to parse multiple CPUs

Problem is only happening in an AI Notebook which had 16 cores in line 30:

We have implemented AI Notebook auto shutdown using this script:

https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/ec96ce8267511ea77ffa9f5c5fe1aa8601bf10e6/notebooks/tools/auto-shutdown/ashutdown

This version of script is a bit old but I believe that issue I ran into will impact the latest version as well.
Problem is only happening in an AI Notebook which had 16 cores in line 30

ai-platform-samples/notebooks/tools/auto-shutdown/ashutdown

Line 30 in ec96ce8

 CPU_PERCENT=$(mpstat -P ALL 1 1 | awk '/Average:/ && $2 ~ /[0–9]/ {print $3}') 

For multicore VM CPU_PERCENT returns a value of 100.00 0.00 0.99 It has new lines in it and
It breaks the numeric comparison.

/var/log$ mpstat -P ALL 1 1 | awk '/Average:/ && $2 ~ /[0–9]/ {print $3}'
100.00
0.00
0.99

training_an_xgboost_model_with_ai_hub.ipynb error

add no execute to template

make sure other weird requirements are met as easily as possible

open jupyter lab does not point to the custom container instance

Describe the bug
I created a custom container and push it. Then I created a notebook instance using that image. When I click the "Open JupyterLab" link it does give me a jupyter notebook. So far great but this jupyter notebook is not recognizing any of the dependencies I installed through the custom container. How would I bridge this "open juypterlab" thing to use everything from my container (or maybe the jupyter version from my container).

ai-explanations-image.ipynb error

Error importing scanpy package ... missing f-strings support?

Describe the bug
Hello. I depend on a python package scanpy (https://scanpy.readthedocs.io/en/stable/) and I can install it no problem in colaboratory, but the AI platform jupyter lab needs a python update.

What sample is this bug related to?
installing scanpy (pip3 install --user scanpy)

Source code / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

Traceback (most recent call last):

File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 1, in
import scanpy

File "/home/jupyter/.local/lib/python3.5/site-packages/scanpy/init.py", line 3, in
from .utils import check_versions, annotate_doc_types

File "/home/jupyter/.local/lib/python3.5/site-packages/scanpy/utils.py", line 18, in
from ._settings import settings

File "/home/jupyter/.local/lib/python3.5/site-packages/scanpy/_settings.py", line 351
f'{k} = {v!r}'

To Reproduce
Steps to reproduce the behavior:

start a jupyterlab notebook
!pip3 install --user scanpy
restart kernel
import scanpy
error

Expected behavior
package should be imported

System Information
default python selection

pip freeze |grep tensorflow
pip freeze |grep apache-beam

Additional context
Add any other context about the problem here.

How are Kaggle Notebooks selected?

Is your feature request related to a problem? Please describe.
I am curious if Kaggle NB requests can be taken or not. Obviously it will be subject to review.

Describe the solution you'd like
Propose a community entry for Kaggle NBs for AI Hub.

Key technologies to include

Workflow (Training, Serving, Complete Guide):
AI Platform specfic features (CPU, GPU, HP Tuning, TPU):
Framework (Tensorflow, Keras, scikit-learn, XGBoost, ...):
Model:
Dataset:

Additional context
None

serving_pytorch_models_in_ai_platform.ipynb error

ai_platform_optimizer_conditional_parameters.ipynb Notebook error

+ execute notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb
+ QUIET_FULL='pip install --quiet'
+ QUIET_SHORT='pip install -q'
+ grep -q 'pip install --quiet' notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb
+ grep -q 'pip install -q' notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb
+ sed -i 's/pip *install/pip install -q/g' notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb
+ mkdir /tmpfs/src/temp
+ cp --parents notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb /tmpfs/src/temp
+ python /tmpfs/src/gfile/executor.py --input_notebook=/tmpfs/src/temp/notebooks/samples/optimizer/ai_platform_optimizer_conditional_parameters.ipynb --timeout=15000
I0409 16:03:48.866024 140071264257792 execute.py:404] Executing notebook with kernel: python3
E0409 16:03:58.055611 140071264257792 execute.py:509] Kernel died while waiting for execute reply.
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 478, in _poll_for_reply
    msg = self.kc.shell_channel.get_msg(timeout=timeout)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/jupyter_client/blocking/channels.py", line 57, in get_msg
    raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmpfs/src/gfile/executor.py", line 115, in <module>
    app.run(main)
  File "/home/kbuilder/.local/lib/python3.5/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/kbuilder/.local/lib/python3.5/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/tmpfs/src/gfile/executor.py", line 110, in main
    timeout=FLAGS.timeout)
  File "/tmpfs/src/gfile/executor.py", line 92, in execute
    'path': str(pathlib.Path(input_notebook).parent)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 408, in preprocess
    self.set_widgets_metadata()
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 348, in setup_preprocessor
    yield nb, self.km, self.kc
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/tmpfs/src/gfile/executor.py", line 52, in preprocess_cell
    return super().preprocess_cell(cell, resources, cell_index, **kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 438, in preprocess_cell
    reply, outputs = self.run_cell(cell, cell_index, store_history)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 578, in run_cell
    exec_reply = self._poll_for_reply(parent_msg_id, cell, timeout)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 483, in _poll_for_reply
    self._check_alive()
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 510, in _check_alive
    raise DeadKernelError("Kernel died")
nbconvert.preprocessors.execute.DeadKernelError: Kernel died

Runtime version 1.14 update

Update to runtime version and TF 1.14 https://cloud.google.com/ml-engine/docs/tensorflow/runtime-version-list

Running setup.sh

Everytime I run setup.sh my terminal exits

[ai-platform-samples-venv] ~/Documents/Development/dpe/ai-platform-samples/setup tensorflow-sample*$ source ./setup.sh 

export RUNTIME_VERSION=1.13
export PYTHON_VERSION=3.5
export REGION=us-central1

# Replace "your-gcp-project-id" with your gcp PROJECT ID
export PROJECT_ID="dpe-cloud-mle"

# Replace "your-gcp-bucket-name" with a universally unique name for a GCS bucket
export BUCKET_NAME="dpe-sandbox"

# Replace "path/to/service/account/key" with the full path to the
# service account key file which you created and downloaded
export GOOGLE_APPLICATION_CREDENTIALS="/Users/gogasca/Documents/Development/dpe/keys/dpe-cloud-mle.json"


if [[ ${PROJECT_ID} == "your-gcp-project-id" ]]
then
  echo "Error: Please set PROJECT_ID to your gcp Project ID"
fi

if [[ ${BUCKET_NAME} == "your-gcp-bucket-name" ]]
then
  echo "Error: Please set BUCKET_NAME to an existing GCS bucket"
fi

if [[ -z ${GOOGLE_APPLICATION_CREDENTIALS} == "path/to/service/account/key" ]]
-bash: ./setup.sh: line 44: syntax error in conditional expression
shell_session_update
Saving session.../bin/date +%s
completed.
/usr/bin/find "$SHELL_SESSION_TIMESTAMP_FILE" -mtime -1d

[Process completed]

Scikit-learn version

Getting warning:

  warnings.warn(msg, category=DeprecationWarning)
INFO:root:Arguments: Namespace(input='/tmp/datasets/small/taxi_trips_train.csv', job_dir='./trained/structured-taxi', log_level='DEBUG', max_depth=3, n_estimators=20)```

1. In `setup.py`
sciki-learn is defined as 'scikit-learn>=0.19.1' should we upgrade to a most recent one?

2. Typo in note:

# Notes:
# TAXI_TRAIN_SMALL is set by datasets/downlaod-taxi.sh script

ai_platform_transfer_learning.ipynb Notebook error

+ python /tmpfs/src/gfile/executor.py --input_notebook=/tmpfs/src/temp/notebooks/samples/tensorflow/transfer_learning/ai_platform_transfer_learning.ipynb --timeout=15000
I0409 16:09:50.662054 139698518648576 execute.py:404] Executing notebook with kernel: python3
E0409 16:09:56.078719 139698518648576 execute.py:509] Kernel died while waiting for execute reply.
Traceback (most recent call last):
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 478, in _poll_for_reply
    msg = self.kc.shell_channel.get_msg(timeout=timeout)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/jupyter_client/blocking/channels.py", line 57, in get_msg
    raise Empty
queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmpfs/src/gfile/executor.py", line 115, in <module>
    app.run(main)
  File "/home/kbuilder/.local/lib/python3.5/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/kbuilder/.local/lib/python3.5/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/tmpfs/src/gfile/executor.py", line 110, in main
    timeout=FLAGS.timeout)
  File "/tmpfs/src/gfile/executor.py", line 92, in execute
    'path': str(pathlib.Path(input_notebook).parent)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 408, in preprocess
    self.set_widgets_metadata()
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 348, in setup_preprocessor
    yield nb, self.km, self.kc
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/tmpfs/src/gfile/executor.py", line 52, in preprocess_cell
    return super().preprocess_cell(cell, resources, cell_index, **kwargs)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 438, in preprocess_cell
    reply, outputs = self.run_cell(cell, cell_index, store_history)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 578, in run_cell
    exec_reply = self._poll_for_reply(parent_msg_id, cell, timeout)
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 483, in _poll_for_reply
    self._check_alive()
  File "/tmpfs/src/tf_docs_env/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 510, in _check_alive
    raise DeadKernelError("Kernel died")
nbconvert.preprocessors.execute.DeadKernelError: Kernel died
+ delete_or_move notebooks/samples/tensorflow/transfer_learning/ai_platform_transfer_learning.ipynb
+ EXIT_STATUS=1

ai-platform-samples/notebooks/

I don't understand the relevance of the section on notebooks. Not clear from reading the README whom the targeted audience would be. Seems to the ordinary MLE and Data Scientist would not use anything in this section.

googlecloudplatform / ai-platform-samples Goto Github PK