Git Product home page Git Product logo

struct2tensor's Introduction

Struct2Tensor

Python PyPI

Introduction

struct2tensor is a library for parsing structured data inside of tensorflow. In particular, it makes it easy to manipulate structured data, e.g., slicing, flattening, copying substructures, and so on, as part of a TensorFlow model graph. The notebook in 'examples/prensor_playground.ipynb' provides a few examples of struct2tensor in action and an introduction to the main concepts. You can run the notebook in your browser through Google's colab environment, or download the file to run it in your own Jupyter environment.

There are two main use cases of this repo:

  1. To create a PIP package. The PIP package contains plug-ins (OpKernels) to an existing tensorflow installation.
  2. To staticlly link with tensorflow-serving.

As these processes are independent, one can follow either set of directions below.

Use a pre-built Linux PIP package.

From a virtual environment, run:

pip install struct2tensor

Nightly Packages

Struct2Tensor also hosts nightly packages at https://pypi-nightly.tensorflow.org on Google Cloud. To install the latest nightly package, please use the following command:

pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple struct2tensor

This will install the nightly packages for the major dependencies of struct2tensor such as TensorFlow Metadata (TFMD).

Creating a PIP package.

The struct2tensor PIP package is useful for creating models. It works with tensorflow 2.x.

In order to unify the process, we recommend compiling struct2tensor inside a docker container.

Downloading the Code

Go to your home directory.

Download the source code.

git clone https://github.com/google/struct2tensor.git
cd ~/struct2tensor

Use docker-compose

Install docker-compose.

Use it to build a pip wheel for Python 3.8 with tensorflow version 2:

docker-compose build --build-arg PYTHON_VERSION=3.8 manylinux2014
docker-compose run -e TF_VERSION=RELEASED_TF_2 manylinux2014

This will create a manylinux package in the ~/struct2tensor/dist directory.

Creating a static library

In order to construct a static library for tensorflow-serving, we run:

bazel build -c opt struct2tensor:struct2tensor_kernels_and_ops

This can also be linked into another library.

TensorFlow Serving docker image

struct2tensor needs a couple of custom TensorFlow ops to function. If you train a model with struct2tensor and wants to serve it with TensorFlow Serving, the TensorFlow Serving binary needs to link with those custom ops. We have a pre-built docker image that contains such a binary. The Dockerfile is available at tools/tf_serving_docker/Dockerfile. The image is available at gcr.io/tfx-oss-public/s2t_tf_serving.

Please see the Dockerfile for details. But in brief, the image exposes port 8500 as the gRPC endpoint and port 8501 as the REST endpoint. You can set two environment variables MODEL_BASE_PATH and MODEL_NAME to point it to your model (either mount it to the container, or put your model on GCS). It will look for a saved model at ${MODEL_BASE_PATH}/${MODEL_NAME}/${VERSION_NUMBER}, where VERSION_NUMBER is an integer.

Compatibility

struct2tensor tensorflow
0.45.0 2.13.0
0.44.0 2.12.0
0.43.0 2.11.0
0.42.0 2.10.0
0.41.0 2.9.0
0.40.0 2.9.0
0.39.0 2.8.0
0.38.0 2.8.0
0.37.0 2.7.0
0.36.0 2.7.0
0.35.0 2.6.0
0.34.0 2.6.0
0.33.0 2.5.0
0.32.0 2.5.0
0.31.0 2.5.0
0.30.0 2.4.0
0.29.0 2.4.0
0.28.0 2.4.0
0.27.0 2.4.0
0.26.0 2.3.0
0.25.0 2.3.0
0.24.0 2.3.0
0.23.0 2.3.0
0.22.0 2.2.0
0.21.1 2.1.0
0.21.0 2.1.0
0.0.1.dev* 1.15

struct2tensor's People

Contributors

andylou2 avatar brills avatar iindyk avatar mzinkevi avatar tfx-copybara avatar zoyahav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

struct2tensor's Issues

Usage with pyarrow parquet

Hello, I'm very interested by the library usage however I struggle to apply it to a parquet file other than the dremel example.

from struct2tensor import expression_impl
import struct2tensor as s2t
import pyarrow as pa
import pyarrow.parquet as pq

tbl = pa.table([pa.array([0, 1])], names='a')
pq.ParquetWriter('/tmp/test', tbl.schema).write_table(tbl)
filenames = ["/tmp/test"]
batch_size = 2

exp = s2t.expression_impl.parquet.create_expression_from_parquet_file(filenames)
ps = exp.project(['a'])

val = s2t.expression_impl.parquet.calculate_parquet_values([ps], exp, 
                                        filenames, batch_size)
for h in val:
    break

segfaults with the error:
2021-04-15 15:30:40.254237: E struct2tensor/kernels/parquet/parquet_reader.cc:198]
The repetition type of the root node was 0, but should be 2. There may be something wrong with your supplied parquet schema. We will treat it as a repeated field.

2021-04-15 15:31:46.428109: W tensorflow/core/framework/dataset.cc:477]
Input of ParquetDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.

I also tried saving again the dremel file loaded with Pyarrow and dumping it right away and I can reproduce the error.

How do you advise to save your parquet ?

Thanks for your help !

DecodeProtoSparseV4 not registered

I'm having issues loading a model produced by gcp vertex ai:

import tensorflow as tf
tf.saved_model.load(MY_MODEL_PATH)

I'm receiving error:

Traceback (most recent call last):
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 4177, in _get_op_def
    return self._op_def_cache[type]
KeyError: 'DecodeProtoSparseV4'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    imported = tf.saved_model.load(NORMAL)
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 936, in load
    result = load_internal(export_dir, tags, options)["root"]
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load.py", line 994, in load_internal
    root = load_v1_in_v2.load(export_dir, tags)
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 282, in load
    result = loader.load(tags=tags)
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/load_v1_in_v2.py", line 209, in load
    functions = function_deserialization.load_function_def_library(
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/saved_model/function_deserialization.py", line 406, in load_function_def_library
    func_graph = function_def_lib.function_def_to_graph(
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 70, in function_def_to_graph
    graph_def, nested_to_flat_tensor_name = function_def_to_graph_def(
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 239, in function_def_to_graph_def
    op_def = default_graph._get_op_def(node_def.op)  # pylint: disable=protected-access
  File "/home/walter/yin-model/py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 4181, in _get_op_def
    pywrap_tf_session.TF_GraphGetOpDef(self._c_graph, compat.as_bytes(type),
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'DecodeProtoSparseV4' in binary running on rocky. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

And running environmnet:

$ pip list | grep tensor
struct2tensor                0.39.0
tensorboard                  2.8.0
tensorboard-data-server      0.6.1
tensorboard-plugin-wit       1.8.1
tensorflow                   2.8.0
tensorflow-addons            0.16.1
tensorflow-estimator         2.8.0
tensorflow-io-gcs-filesystem 0.32.0
tensorflow-metadata          1.8.0

To match the environment that google used to create:

$ jq . environment.json
{
  "container_uri": "europe-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:20230630_0325",
  "tensorflow": "2.8.0",
  "struct2tensor": "0.39.0",
  "tensorflow-addons": "0.16.1"
}

This issue suggests that google knows of a fix which they've added to their docker containers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.