Comments (12)
It works now. Thanks for looking into this and fixing it!
from java.
It seems the tensorflow python libraries are built with GCC4 and is using the old ABI as well, and tensorflow-text works with that so I don't think the ABI is the problem here.
Through debugging I found the error is caused by different symbols being referenced at run time for the hash_code check at https://github.com/tensorflow/tensorflow/blob/2b96f3662bd776e277f86997659e61046b56c315/tensorflow/core/framework/resource_mgr.h#L721
In my example the hash_code in the resource handle is the address of the hash_bit symbol in libtensorflow.so.2 but the newly created TypeIndex hash_code resolves to the address of hash_bit symbol in libtensorflow_framework.so.2.
All libraries are loaded up front so naively I would have thought it would consistently resolve to the same hash_bit symbol but evidently that is not the case here.
from java.
It's probably because we're building against CentOS, which uses the old C++ ABI of GCC:
https://bugzilla.redhat.com/show_bug.cgi?id=1546704
libstdc++ is backward compatible, so we can run TensorFlow compiled that way on Ubuntu no problem, but anything that interacts with the C++ API also needs to be compiled that way.
from java.
It seems to come down to how bindings are set up when loading the shared objects.
Using LD_DEBUG=bindings
when running the model via java it shows the op library binds to libtensorflow_framework instead of libtensorflow:
binding file /usr/local/lib/python3.6/dist-packages/tensorflow_text/python/ops/_wordpiece_tokenizer.so [0] to /root/.javacpp/cache/tensorflow-core-api-0.1.0-SNAPSHOT-linux-x86_64.jar/org/tensorflow/internal/c_api/linux-x86_64/libtensorflow_framework.so.2 [0]: normal symbol `_ZZN10tensorflow9TypeIndex4MakeINS_6lookup15LookupInterfaceEEES0_PKcE8hash_bit'
When running the model via python it doesn't bind to libtensorflow_framework but to _pywrap_tensorflow_internal.so (which basically serves the role of libtensorflow for python I guess):
binding file /usr/local/lib/python3.6/dist-packages/tensorflow_text/python/ops/_wordpiece_tokenizer.so [0] to /usr/local/lib/python3.6/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so [0]: normal symbol `_ZZN10tensorflow9TypeIndex4MakeINS_6lookup15LookupInterfaceEEES0_PKcE8hash_bit'
One difference seems to be that the tensorflow libraries are loaded with RTLD_GLOBAL in python, which doesn't seem to be the case currently with the java libs. I tested this by appending '!' to the library names in the tensorflow javacpp preset properties and rebuilding but that didn't help (I also tried to load the libraries directly with Loader.loadGlobal
just to be sure). So RTLD_GLOBAL is probably not the cause for this difference.
from java.
It seems like the difference is that libtensorflow only exports TF_* and TFE_* symbols globally while the python library exports many more symbols (including the hash_bit symbols). When modifying the build of libtensorflow to export the same symbols as the python shared library it works.
This seems to be more of a general issue with libtensorflow since I guess anyone using that directly might run into similar problems with custom ops and exporting all symbols doesn't feel like the best solution.
But until that is fixed upstream it is trivial to apply this change as a patch when building the java libraries:
diff --git a/tensorflow/c/version_script.lds b/tensorflow/c/version_script.lds
index c352a1440d..7716dcb231 100644
--- a/tensorflow/c/version_script.lds
+++ b/tensorflow/c/version_script.lds
@@ -1,8 +1,16 @@
VERS_1.0 {
# Export symbols in c_api.h.
global:
+ *tensorflow*;
+ *toco*;
+ *perftools*gputools*;
+ *tf_*;
*TF_*;
+ *Eager*;
*TFE_*;
+ *nsync_*;
+ *stream_executor*;
+ *xla*;
# Hide everything else.
local:
I can put together a PR for it as well of course if you want
from java.
We should find out why those symbols aren't exported. I suspect it's because the python API is using things that aren't considered part of the C API. It is a little odd to me that other native libraries depend on those symbols. Maybe the modular tensorflow people know more?
from java.
While on the Java side, we restrict our development using the C API, we cannot prevent maintainers of these custom op libraries from trespassing it. So if we want to support them, it looks like we need to export these symbols. What are the side effects of this @samuelotter , does it increases significantly the TF binary size?
from java.
It shouldn't increase the size of the binaries in any material way and while I haven't built and tested all targets the ones I have built show no significant difference.
I'm not an expert on custom ops but it seems they are usually implemented in C++ directly towards the C++ API right now, which is not ABI stable. I'm not even sure the C API exposes everything you need to implement many custom ops right now. While there seems to be ongoing work to provide an ABI stable way of doing this by wrapping the C APIs that doesn't seem to be fully complete yet. I think chances are high that many custom ops would not work out of the box with libtensorflow right now which is unfortunate.
Working with this has made me think about how using custom ops with the java bindings could be improved long term. Getting custom ops to work with the java bindings won't be a great experience even if this problem is solved since most custom ops are primarily distributed as python packages you either have to also have python libraries installed and try to find the location of the custom ops library paths on your system, or do what I've done and extract the shared objects from the python wheels and repackage them in jar. Neither option is very elegant.
I guess the ideal scenario would be for popular custom op libraries to also be wrapped and distributed as java libraries. I'm not sure how to get there though, but a start may be to provide some sort of template project which could be used as a starting point for wrapping a custom op library to at least reduce the barrier somewhat.
Tensorflow Serving comes with a bunch of commonly used/popular custom ops supported out of the box, in a similar vein it might be worth considering maintaining packages for a few of the more popular ones (such as tensorflow-text and tensorflow-io which are also tensorflow projects but not in the core repo) under the umbrella of this project?
from java.
I guess the ideal scenario would be for popular custom op libraries to also be wrapped and distributed as java libraries.
I share your point of view here. Ideally, it should be distributed directly by the providers of these ops and we should make it easy for them to do it. Basically, we could just the providers define an API def of their op like those ones and let them use the same generator as we use to create concrete Java wrappers that could then be packaged with Maven.
There has been great progress to support custom ops from the C API with the addition of the Kernel endpoints. But probably these libraries were never updated to make use of them, could also be a contribution we do for them? This way, we won't have these symbols export issues. But I don't know how complex is this task.
from java.
@samuelotter BTW, we would also need to modify exported_symbols.lds for Mac and something else for this to work on Windows.
from java.
It looks like the tensorflow_cc
target is still there and supported, for example, see tensorflow/tensorflow#41904.
Maybe we could try to use that instead of tensorflow
?
from java.
@samuelotter , now that we have applied these changes, can you please validate that you can now successfully load tensorflow-text
when using the latest snapshots of TF Java?
from java.
Related Issues (20)
- protobuf-java 3.19.4 contains 3 high vulnerabilities HOT 2
- Complex Tensor Implementation Missing? HOT 5
- ivy dependency not working on windows or linux, native TF code not found on classpath HOT 5
- Could not load dynamic library 'xxxxx'; dlerror: xxxxx.dll not found HOT 5
- org.tensorflow.TensorFlowException: Can't parse /<modelPath>/<somePathToFolder>/saved_model.pb as binary proto - JDK 17 HOT 15
- Compiling from source, cuDNN version is not compatible? How can I change the cuDNN compile version? HOT 2
- SavedModelBundle Unable to Load Models with coo_sparse Encoded Input HOT 1
- Tensor type issue HOT 3
- Unable to build the project using 'mvn install ' command HOT 4
- how to use importGraphDef to load model.pb file? HOT 2
- Modular Java app can't create tensor object HOT 3
- Read/Write method of DataBuffer is against intuition HOT 1
- Distributing an Apple Silicon binary HOT 2
- Error when using tensorflow-text on tensorflow-core HOT 8
- Reductions on losses that have dynamic size
- No documentation for 1.0.0 HOT 5
- The loss CatagoricalCrossEntropy is currently unusable in framework
- Build native codes current master fails HOT 4
- Native artifacts are pulling transitive dependencies
- Q: How to add Regularizer and Constraint effect correctly to a weight/bias variable or activity? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from java.