Git Product home page Git Product logo

graph_def_editor's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graph_def_editor's Issues

Replicate and extend TF fold_batch_norms rewrites

The rewrites fold_batch_norms and fold_old_batch_norms in the TensorFlow Graph Transform Tool do not work when the batch normalization layer is immediately after a DepthwiseConv2D layer. As a result, these rewrites do not work with MobileNetV2 or any model that embeds MobileNetV2. This seems like a rather significant oversight, given that MobileNet and MobileNet-derived models are the most common use case for these kinds of graph-simplifying rewrites. This problem affects several models in the Model Asset Exchange

Folding batch normalization into depthwise convolution is a bit tricky because each coefficient in a depthwise convolution participates in every output. In particular, the formula for a depthwise convolution is:

output[b, i, j, k * channel_multiplier + q] = sum_{di, dj}
     filter[di, dj, k, q] * input[b, strides[1] * i + rate[0] * di,
                                     strides[2] * j + rate[1] * dj, k]

(see https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d).
This reuse of filter elements means that we can't fold a batch normalization that happens after the depthwise convolution into the convolution. Batch normalization multiplies each channel by a different amount (1/stdev of the channel), and there's no one place in the filters where those amounts could be added.

Instead, we need to fuse batch normalization into a Conv2D or DepthwiseConv2D that happens after the normalization. This fusion is a bit more tricky, because batch normalization breaks down into a multiply followed by an add, and there is typically a ReLU before the next convolution. For example, the basic building block of MobileNet v1 is

3x3 DepthwiseConv2D -> BN -> ReLU -> 1x1 Conv2D -> BN -> ReLU

The second BN in this chain is covered by the existing rewrite. We need to fold the first BN into the 1x1 Conv2D that happens after it. That chunk

BN-> ReLU -> 1x1 Conv2D

breaks down to

Multiply -> Add -> ReLU -> 1x1 Conv2D

So we need to pull the multiply into the Conv2D. Another way to write the above sequence of ops is:

Conv2D(ReLU(mx + b))
    == Conv2D(ReLU(m(x + b/m))
    == Conv2D(m * ReLU(x + b/m)) iff m >= 0 (cell-wise)

As it happens, m is always >= 0, since it's equal to 1/stdev. So, switching back to operator notation, we just need to turn

Multiply -> Conv2D

into a single Conv2D and rewrite the Add(b) to Add(b/m).

The equation for Conv2D is:

output[b, i, j, k] =
    sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
                    filter[di, dj, q, k]

(see https://www.tensorflow.org/api_docs/python/tf/nn/conv2d). Collapsing down the striding parts to f_i and f_j, we have:

output[b, i, j, k] =
    sum_{di, dj, q} input[b, f_i(i, di), f_j(j, dj), q] * filter[di, dj, q, k]

So the equation for a Conv2D on top of a multiplication by m is:

output[b, i, j, k] =
    sum_{di, dj, q} (input[b, f_i(i, di), f_j(j, dj), q] * m[q]) * filter[di, dj, q, k]
    = sum_{di, dj, q} input[b, f_i(i, di), f_j(j, dj), q] * (m[q] * filter[di, dj, q, k])

So we just need to multiply every filter element in filter[_, _, q, _] by m[q] for each value of q. The same principle applies to DepthwiseConv2D.

Description of work to address this problem:

  • Create a new rewrite in rewrite.py that replicates the current functionality of the fold_batch_norms and fold_old_batch_norms rewrites in the Graph Transform Tool.
  • Port the tests of the original rewrites from C++ to Python and make sure that they still work.
  • Make a new version of the rewrite that folds the batch normalization from the other side.
  • Create tests for the new version of the rewrite.
  • Create an example script that applies the new rewrite to MobileNetV2.

Implement SavedModel I/O

Add support for reading and writing TensorFlow SavedModel files to and from the GraphDef Editor's Graph class. See https://www.tensorflow.org/guide/extend/model_files for a description of the file format.

This support will involve giving the Graph additional fields to support the portions of SavedModel that are not already modeled in Graph -- notably the "signatures" for invoking the model in TensorFlow Serving.

Add regression tests that use the tf.Saver APIs to create temporary SavedModel files (see the Save and Restore guide for more information).

Add an example script in the examples folder that generates a SavedModel file, rewrites it into a second SavedModel file, then loads the second file into a TensorFlow graph and performs some inference.

Graph copying doesn't handle collections properly

If you call gde.copy(g1, g2) and graph g1 contains collections, then GDE attempts to call the add_to_collection() TensorFlow API on g2, as if g2 were a tf.Graph instead of a gde.Graph.

The function assign_renamed_collections_handler() in transform.py needs to be rewritten to use GDE APIs to manage collections in the target graph.

Get tests in transform_test.py working

Update the code in transform.py such that the regression tests pass. Do not attempt to improve the APIs for now; just implement something as close as possible to the original so as to get the tests working.

Reach out to owner of original contrib.graph_editor

Hello there @purpledog! Just a friendly FYI: We at IBM's CODAIT center have been working on a new version of your code in TensorFlow's contrib.graph_editor module. See the parent project of this issue for more information. Please feel free to contact us if you have any questions or would like to be involved!

Refactor select.py to use object-oriented API

The file graph_def_editor/select.py contains various routines for selecting subsets of a graph's nodes and/or tensors.

A few issues with the current organization of this file:

  • Adds 18 functions to the top-level gde namespace
  • Entry points are monolithic -- see, for example, select_s()
  • Return values are not in a single consistent format
  • No easy way to represent a selection expression as an object

To address the above issues and to facilitate the development of the "pattern" part of pattern-action rewrite rules, we should refactor these functions into a more object-oriented design:

  • Add a base class, Selector, for selection expressions over a graph. The base class will have methods to:
    • Get information about what types of things (tensors, nodes, or both) the expression can select
    • Retrieve the current set of selected graph objects as a Python list
    • Retrieve the current set of selected objects as a SubgraphView
  • Make the existing functions return subclasses of Selector
  • Put existing functions under a new namespace gde.select
  • Remove unnecessary subroutines from the top-level package
  • Refactor existing test cases to use the new APIs

Get tests in subgraph_test.py working

Update the code in subgraph.py such that the regression tests pass. Do not attempt to improve the APIs for now; just implement something as close as possible to the original so as to get the tests working.

Remove references to `tf.Graph` from API docs.

There are still some references here and there in docstrings to tf.Graph, tf.Tensor, etc. from the days when this code base was contrib.graph_editor and directly modified TensorFlow's Python graph objects. Find the offending docstrings and fix them. Double-check that these references to TensorFlow internal classes are not attached to chunks of code that haven't been properly ported to operate over GraphDef protobufs instead of tf.Graph objects.

Set up CI

Set up continuous integration for the project. Run a PEP8 linter and all tests when PRs are created. Add a script to the project so that devs can easily run the linter themselves and get the same results as the CI server would.

ValueError("Operation {} does not belong to given graph".format(op)) when running get walk ops functions

Hello,
I'm currently using your library to do some operations on the graph of a model in TensorFlow 2, and I'm having some issues with figuring out the proper way to convert a tensor to either a gde.Node or gde.Tensor object to use in the library's functions. I'm converting my tensors as follows:
cLfqsCA 1
gra is the name of my gde.Graph object, for reference. After converting the tensors this way, when I run get backward walk ops on my ys_g I get a placeholder operation, and when I run get forward walk ops on the xs_g I get ValueError("Operation {} does not belong to given graph".format(op)) as an error. Looking at the code in the util file I see that this is returned after checking that the op has a value for its graph attribute, so I'm guessing this is what's causing issues with my code. How can I make sure that this attribute gets a value when converting? Any help is appreciated, thank you!

Handle graphs with queue_runners and savers collections

Currently, a graph that has a queue_runners collection will fail with error:

NotImplementedError: Can't serialize item '<tensorflow.python.estimator.inputs.queues.feeding_queue_runner._FeedingQueueRunner object at 0x7feddd39cf60>' in collection 'queue_runners' because it is a '_FeedingQueueRunner'.

Add more robust error checking to rewrite.change_batch_size()

The change_batch_size rewrite (see #4) works by putting the new batch size in place at the input nodes, then propagating the batch size through the rest of the graph by shape inference. If the user does not specify all the input nodes, then the remaining nodes will produce a conflicting batch sizes. This can result in an error (if a node ends up with two mutually inconsistent input batch sizes) or in the rewrite having no apparent effect on output batch sizes (if the user changes the batch size to None). As I noted in #13, the script batch_size_example.py has the latter problem. The batch size changes to None, but implicit inputs inside the batch normalization layers change the output batch size to 64.

The proper fix for this problem is as follows:

  • Add error checking code to the change_batch_size rewrite. If the batch size of a node doesn't change, or if type inference fails; then the rewrite should output a detailed error message. The message should contain the name of the node, the node's input shapes, and the names of the nodes that produced those shapes).
  • Use the error checking code to track down all the hidden inputs in the batch_size_example.py script and add them to the inputs set. Note that it may be necessary to run the input graph through the freeze graph script to remove variables. You can invoke the freeze graph script from python by adding from tensorflow.python.tools import freeze_graph to the beginning of your script and calling freeze_graph. freeze_graph_with_def_protos() directly.

Refactor reroute_ts() to be less error-prone.

Every time that I've used the reroute_ts() function in reroute.py, I've ended up putting the arguments in the wrong order. Mixing up the ts0 and ts1 args to this function doesn't generally lead to errors, but to an incorrect modification of the graph that in turn leads to strange problems downstream in the user's code. We should update the documentation, the API, and the error-handling to make this kind of coding error less likely.

Support TensorFlow 1.12, 1.13, and 2.0

There are some pretty major API changes between TensorFlow 1.12, 1.13, and 2.0. But fortunately the APIs that we use haven't changed too much. We should make GDE support all three versions of TF.

Major TODOs to make this happen:

  • Add a flag to the env.sh script to choose TensorFlow version. In the short term, versions newer than 1.12 will need to be installed via pip. When new versions of TF become available on Anaconda, switch to pulling them from Anaconda instead of PyPI.
  • Set up some CI to ensure that future changes are tested against multiple versions of TF.
  • Where there are incompatibilities between TF APIs across versions, add a layer of shims to our Python code so that our code runs with all three versions.

Contributing to graph_def_editor

Hi @BryanCutler @frreiss

Sent you email earlier this week, not sure if you saw that.

I'd like to use graph_def_editor for one of Google projects, but need to do bunch of improvements like support for TF2.x functions. I've already prototype for this running locally, and can start integrating changes into master.

In Google we have an internal process for integration with github (similar to what TensorFlow has), but in order for me to setup Google -> github contributions, I need to have write access to this repro. Would you mind adding me as a maintainer?

Please let me know if this is not acceptable since project is under IBM org or for any other reason. In such case I'll fork project somewhere under TensorFlow org and continue development there.

Thanks,
Aleksey

Make library pip-installable directly from source

Make it possible to pip install a copy of graph_def_editor from the source code repository directly into a local virtualenv. This work does not include generating a pip-installable tarball or posting such a tarball on PyPi.

Make GDE Python 2.7 compatible again

IntelliJ thinks that all the code under graph_def_editor is Python 2.7-compatible, but Python 2.7 does not. We'll need to take the some steps to rectify this issue:

  • Replace the Python 3.x type hints syntax with the backwards compatible syntax described in PEP 484. That is:
    def _validate_output_shapes_attr(value):
      # type: (Any) -> List[tf.TensorShape]
    instead of:
    def _validate_output_shapes_attr(value: Any) -> List[tf.TensorShape]:
    
  • Replace the calls to urllib.request in the three example scripts with something that works in both Python 2.7 and 3.x.
  • Add an option to the "env.sh" script to create a Python 2.7 environment if requested.
  • Run all the tests and examples with Python 2.7 to verify that they work.

Add script to generate API documentation

Add a script to the project that generates a set of API documentation for GDE by invoking pydoc or a similar utility. The docs should come out in markdown and HTML format.

Once #19 is implemented, create a trigger so that API docs for the master branch will be posted to a web site somewhere -- say, Github pages maybe.

Add CONTRIBUTING.md

Create contribution guidelines for the project, and create additional issues as necessary to cover bringing existing code up to the standards described therein.

General outline:

  • CONTRIBUTING.md file at root of project
  • Pull requests encouraged
  • Feel free to open issues with bugs/feature requests
  • CLAs with IBM required for external PRs
  • Copyright notices on new files
  • Code style: TensorFlow standard, but with type hints. Code must run with Python 3.6 and parse with Python 2.7.
  • Tests for new features strongly encouraged

Implement "change batch size" rewrite.

Implement a graph rewrite that changes the batch size of a trained model that hard-codes this batch size. For bonus points, see if it's possible to change to a batch size of "any". That is, set the input placeholder's first dimension to None.

Recommended approach:

  • Identify input placeholder(s) (and/or allow the user to specify them)
  • Modify the shape of the input placeholders
  • Walk through the graph performing shape inference

Make that graph-level type inference be a method of the Graph class instead of a stand-alone utility.

Add tests for the rewrite and an example under the examples folder.

Clean up copyright notices

Go systematically through the copyright notices and make sure that

# Copyright 2019 IBM. All Rights Reserved.

is present everywhere.

is the following operation possible

I have a tensorflow saved model pb.

I would like to replace part of that pb (a subgraph) with a custom operator.

Is something like this feasible using this framework? If not currently I am willing to help contribute if in general this an be achieved.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.