codait / graph_def_editor Goto Github PK
View Code? Open in Web Editor NEWGraphDef Editor: A port of the TensorFlow contrib.graph_editor package that operates over serialized graphs
License: Apache License 2.0
GraphDef Editor: A port of the TensorFlow contrib.graph_editor package that operates over serialized graphs
License: Apache License 2.0
The rewrites fold_batch_norms
and fold_old_batch_norms
in the TensorFlow Graph Transform Tool do not work when the batch normalization layer is immediately after a DepthwiseConv2D
layer. As a result, these rewrites do not work with MobileNetV2 or any model that embeds MobileNetV2. This seems like a rather significant oversight, given that MobileNet and MobileNet-derived models are the most common use case for these kinds of graph-simplifying rewrites. This problem affects several models in the Model Asset Exchange
Folding batch normalization into depthwise convolution is a bit tricky because each coefficient in a depthwise convolution participates in every output. In particular, the formula for a depthwise convolution is:
output[b, i, j, k * channel_multiplier + q] = sum_{di, dj}
filter[di, dj, k, q] * input[b, strides[1] * i + rate[0] * di,
strides[2] * j + rate[1] * dj, k]
(see https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d).
This reuse of filter elements means that we can't fold a batch normalization that happens after the depthwise convolution into the convolution. Batch normalization multiplies each channel by a different amount (1/stdev of the channel), and there's no one place in the filters where those amounts could be added.
Instead, we need to fuse batch normalization into a Conv2D or DepthwiseConv2D that happens after the normalization. This fusion is a bit more tricky, because batch normalization breaks down into a multiply followed by an add, and there is typically a ReLU before the next convolution. For example, the basic building block of MobileNet v1 is
3x3 DepthwiseConv2D -> BN -> ReLU -> 1x1 Conv2D -> BN -> ReLU
The second BN in this chain is covered by the existing rewrite. We need to fold the first BN into the 1x1 Conv2D that happens after it. That chunk
BN-> ReLU -> 1x1 Conv2D
breaks down to
Multiply -> Add -> ReLU -> 1x1 Conv2D
So we need to pull the multiply into the Conv2D. Another way to write the above sequence of ops is:
Conv2D(ReLU(mx + b))
== Conv2D(ReLU(m(x + b/m))
== Conv2D(m * ReLU(x + b/m)) iff m >= 0 (cell-wise)
As it happens, m
is always >= 0, since it's equal to 1/stdev. So, switching back to operator notation, we just need to turn
Multiply -> Conv2D
into a single Conv2D and rewrite the Add(b) to Add(b/m).
The equation for Conv2D is:
output[b, i, j, k] =
sum_{di, dj, q} input[b, strides[1] * i + di, strides[2] * j + dj, q] *
filter[di, dj, q, k]
(see https://www.tensorflow.org/api_docs/python/tf/nn/conv2d). Collapsing down the striding parts to f_i and f_j, we have:
output[b, i, j, k] =
sum_{di, dj, q} input[b, f_i(i, di), f_j(j, dj), q] * filter[di, dj, q, k]
So the equation for a Conv2D on top of a multiplication by m is:
output[b, i, j, k] =
sum_{di, dj, q} (input[b, f_i(i, di), f_j(j, dj), q] * m[q]) * filter[di, dj, q, k]
= sum_{di, dj, q} input[b, f_i(i, di), f_j(j, dj), q] * (m[q] * filter[di, dj, q, k])
So we just need to multiply every filter element in filter[_, _, q, _]
by m[q]
for each value of q
. The same principle applies to DepthwiseConv2D.
Description of work to address this problem:
rewrite.py
that replicates the current functionality of the fold_batch_norms
and fold_old_batch_norms
rewrites in the Graph Transform Tool.Add support for reading and writing TensorFlow SavedModel files to and from the GraphDef Editor's Graph
class. See https://www.tensorflow.org/guide/extend/model_files for a description of the file format.
This support will involve giving the Graph
additional fields to support the portions of SavedModel that are not already modeled in Graph
-- notably the "signatures" for invoking the model in TensorFlow Serving.
Add regression tests that use the tf.Saver
APIs to create temporary SavedModel files (see the Save and Restore guide for more information).
Add an example script in the examples
folder that generates a SavedModel file, rewrites it into a second SavedModel file, then loads the second file into a TensorFlow graph and performs some inference.
If you call gde.copy(g1, g2)
and graph g1 contains collections, then GDE attempts to call the add_to_collection()
TensorFlow API on g2
, as if g2
were a tf.Graph
instead of a gde.Graph
.
The function assign_renamed_collections_handler()
in transform.py
needs to be rewritten to use GDE APIs to manage collections in the target graph.
Currently, if an item is attempted to be added to a collection that it is already a member of, an error is raised. TensorFlow does not treat collections as sets, see https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/ops.py#L3917, and it will allow duplicate items in collections. Graph Def Editor should be able to handle this
Update the code in transform.py
such that the regression tests pass. Do not attempt to improve the APIs for now; just implement something as close as possible to the original so as to get the tests working.
Hello there @purpledog! Just a friendly FYI: We at IBM's CODAIT center have been working on a new version of your code in TensorFlow's contrib.graph_editor
module. See the parent project of this issue for more information. Please feel free to contact us if you have any questions or would like to be involved!
The file graph_def_editor/select.py
contains various routines for selecting subsets of a graph's nodes and/or tensors.
A few issues with the current organization of this file:
gde
namespaceselect_s()
To address the above issues and to facilitate the development of the "pattern" part of pattern-action rewrite rules, we should refactor these functions into a more object-oriented design:
Selector
, for selection expressions over a graph. The base class will have methods to:
Selector
gde.select
Update the code in subgraph.py
such that the regression tests pass. Do not attempt to improve the APIs for now; just implement something as close as possible to the original so as to get the tests working.
There are still some references here and there in docstrings to tf.Graph
, tf.Tensor
, etc. from the days when this code base was contrib.graph_editor
and directly modified TensorFlow's Python graph objects. Find the offending docstrings and fix them. Double-check that these references to TensorFlow internal classes are not attached to chunks of code that haven't been properly ported to operate over GraphDef
protobufs instead of tf.Graph
objects.
Set up continuous integration for the project. Run a PEP8 linter and all tests when PRs are created. Add a script to the project so that devs can easily run the linter themselves and get the same results as the CI server would.
Hello,
I'm currently using your library to do some operations on the graph of a model in TensorFlow 2, and I'm having some issues with figuring out the proper way to convert a tensor to either a gde.Node or gde.Tensor object to use in the library's functions. I'm converting my tensors as follows:
gra is the name of my gde.Graph object, for reference. After converting the tensors this way, when I run get backward walk ops on my ys_g I get a placeholder operation, and when I run get forward walk ops on the xs_g I get ValueError("Operation {} does not belong to given graph".format(op)) as an error. Looking at the code in the util file I see that this is returned after checking that the op has a value for its graph attribute, so I'm guessing this is what's causing issues with my code. How can I make sure that this attribute gets a value when converting? Any help is appreciated, thank you!
Currently, a graph that has a queue_runners
collection will fail with error:
NotImplementedError: Can't serialize item '<tensorflow.python.estimator.inputs.queues.feeding_queue_runner._FeedingQueueRunner object at 0x7feddd39cf60>' in collection 'queue_runners' because it is a '_FeedingQueueRunner'.
The change_batch_size
rewrite (see #4) works by putting the new batch size in place at the input nodes, then propagating the batch size through the rest of the graph by shape inference. If the user does not specify all the input nodes, then the remaining nodes will produce a conflicting batch sizes. This can result in an error (if a node ends up with two mutually inconsistent input batch sizes) or in the rewrite having no apparent effect on output batch sizes (if the user changes the batch size to None
). As I noted in #13, the script batch_size_example.py
has the latter problem. The batch size changes to None
, but implicit inputs inside the batch normalization layers change the output batch size to 64.
The proper fix for this problem is as follows:
change_batch_size
rewrite. If the batch size of a node doesn't change, or if type inference fails; then the rewrite should output a detailed error message. The message should contain the name of the node, the node's input shapes, and the names of the nodes that produced those shapes).batch_size_example.py
script and add them to the inputs set. Note that it may be necessary to run the input graph through the freeze graph script to remove variables. You can invoke the freeze graph script from python by adding from tensorflow.python.tools import freeze_graph
to the beginning of your script and calling freeze_graph. freeze_graph_with_def_protos()
directly.Every time that I've used the reroute_ts()
function in reroute.py
, I've ended up putting the arguments in the wrong order. Mixing up the ts0
and ts1
args to this function doesn't generally lead to errors, but to an incorrect modification of the graph that in turn leads to strange problems downstream in the user's code. We should update the documentation, the API, and the error-handling to make this kind of coding error less likely.
There are some pretty major API changes between TensorFlow 1.12, 1.13, and 2.0. But fortunately the APIs that we use haven't changed too much. We should make GDE support all three versions of TF.
Major TODOs to make this happen:
env.sh
script to choose TensorFlow version. In the short term, versions newer than 1.12 will need to be installed via pip
. When new versions of TF become available on Anaconda, switch to pulling them from Anaconda instead of PyPI.Sent you email earlier this week, not sure if you saw that.
I'd like to use graph_def_editor for one of Google projects, but need to do bunch of improvements like support for TF2.x functions. I've already prototype for this running locally, and can start integrating changes into master.
In Google we have an internal process for integration with github (similar to what TensorFlow has), but in order for me to setup Google -> github contributions, I need to have write access to this repro. Would you mind adding me as a maintainer?
Please let me know if this is not acceptable since project is under IBM org or for any other reason. In such case I'll fork project somewhere under TensorFlow org and continue development there.
Thanks,
Aleksey
Make it possible to pip install a copy of graph_def_editor
from the source code repository directly into a local virtualenv. This work does not include generating a pip-installable tarball or posting such a tarball on PyPi.
IntelliJ thinks that all the code under graph_def_editor is Python 2.7-compatible, but Python 2.7 does not. We'll need to take the some steps to rectify this issue:
def _validate_output_shapes_attr(value):
# type: (Any) -> List[tf.TensorShape]
def _validate_output_shapes_attr(value: Any) -> List[tf.TensorShape]:
urllib.request
in the three example scripts with something that works in both Python 2.7 and 3.x.Add a script to the project that generates a set of API documentation for GDE by invoking pydoc
or a similar utility. The docs should come out in markdown and HTML format.
Once #19 is implemented, create a trigger so that API docs for the master branch will be posted to a web site somewhere -- say, Github pages maybe.
Create contribution guidelines for the project, and create additional issues as necessary to cover bringing existing code up to the standards described therein.
General outline:
Create an experimental fork of the TensorFlow Large Model Support project. Port the LMS library to use GraphDef Editor in place of contrib.graph_editor
.
Implement a graph rewrite that changes the batch size of a trained model that hard-codes this batch size. For bonus points, see if it's possible to change to a batch size of "any". That is, set the input placeholder's first dimension to None
.
Recommended approach:
Make that graph-level type inference be a method of the Graph
class instead of a stand-alone utility.
Add tests for the rewrite and an example under the examples
folder.
Hello,Do you knwo how to change operation name in the graph?Which api should I use?
Go systematically through the copyright notices and make sure that
# Copyright 2019 IBM. All Rights Reserved.
is present everywhere.
I have a tensorflow saved model pb.
I would like to replace part of that pb (a subgraph) with a custom operator.
Is something like this feasible using this framework? If not currently I am willing to help contribute if in general this an be achieved.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.