Comments (12)
We refactored the --num_modules
flag to be set in the gin config for each specific problem due it being a pretty critical value for reproducibility in the regalloc case. It looks like I forgot to update the documentation in regards to that. You can just omit the flag. I'd recommend not setting the --num_workers
flag unless you have a compelling case to do so. It sets a completely different parameter than what --num_modules
used to modify. In regards to the specific error that you're seeing, it seems like the script isn't able to pick up the BC model. Did you perform the behavioral cloning step? And if so, what files are present in the directory mentioned by the gin binding flag setting that variable?
from ml-compiler-opt.
BC model is the LLVM bytecode model ?
I use the train_bc.py successfully. The problem is in the train_locally.py.
The command information show the model load successfully.
This is the full log info:
performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
W0925 04:03:23.532205 140184883676992 ppo_agent.py:342] Only tf.keras.optimizers.Optimiers are well supported, got a non-TF2 optimizer: <tensorflow.python.training.adam.AdamOptimizer object at 0x7f7e9dd49460>
I0925 04:03:24.762801 140184883676992 common.py:1009] No checkpoint available at /code/model
I0925 04:03:26.191171 140184883676992 train_locally.py:101] Loading module specs from corpus at /code/corpus.
I0925 04:03:30.300293 140184883676992 train_locally.py:107] Done loading module specs from corpus.
I0925 04:03:30.300908 140184883676992 train_locally.py:133] Loaded Reward Stat Map from disk, containing 0 modules
I0925 04:03:30.514247 140184883676992 train_locally.py:152] Last iteration took: 0.004603
W0925 04:03:32.547599 140184883676992 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Deterministic_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_policy/assets
I0925 04:03:33.073540 140184883676992 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_policy/assets
2022-09-25 04:03:34.994831: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-25 04:03:34.994904: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-25 04:03:34.995828: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_policy
2022-09-25 04:03:35.000722: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-25 04:03:35.000781: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_policy
2022-09-25 04:03:35.017182: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:365] MLIR V1 optimization pass is not enabled
2022-09-25 04:03:35.023192: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-25 04:03:35.092413: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_policy
2022-09-25 04:03:35.147566: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 151744 microseconds.
2022-09-25 04:03:35.242257: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY
to enable.
2022-09-25 04:03:35.444218: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
W0925 04:03:37.566624 140184883676992 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Categorical_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_collect_policy/assets
I0925 04:03:38.054838 140184883676992 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_collect_policy/assets
2022-09-25 04:03:40.066622: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-25 04:03:40.066686: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-25 04:03:40.066882: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_collect_policy
2022-09-25 04:03:40.071930: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-25 04:03:40.071989: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_collect_policy
2022-09-25 04:03:40.093924: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-25 04:03:40.173268: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy
2022-09-25 04:03:40.228462: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 161578 microseconds.
2022-09-25 04:03:40.557391: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
I0925 04:03:40.805665 140184883676992 local_data_collector.py:78] Waiting for pending work from last iteration took 0.000003
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpgjfr19pm/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpw8v7dxdu/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp_qg7173y/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmps93tsj2r/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpndhn0nu2/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpne14xdzf/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp5doda41v/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
3 errors generated.
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpupt7jlc5/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpl5mm4t7i/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
So I try print the command line in inline_runner.py by following code:
_try:
command_line = []
if self._launcher_path:
command_line.append(self._launcher_path)
command_line.extend([self._clang_path] + list(module_spec.exec_cmd) + [
'-mllvm', '-enable-ml-inliner=development', '-mllvm',
'-training-log=' + log_path, '-o', output_native_path
])
if tf_policy_path:
command_line.extend(
['-mllvm', '-ml-inliner-model-under-training=' + tf_policy_path])
print("command_line1\n",command_line)
compilation_runner.start_cancellable_process(command_line,
self._compilation_timeout,
self._cancellation_manager)
command_line = [self.llvm_size_path, output_native_path]
print("command_line2\n",command_line)
I use the output command like this:
_'/code/llvm-install/bin/clang' '-cc1' '-triple' 'x86_64-unknown-fuchsia' '-emit-obj' '-massembler-fatal-warnings' '--mrelax-relocations' '-disable-free' '-clear-ast-before-backend' '-disable-llvm-verifier' '-discard-value-names' '-main-file-name' 'block-device-manager.cc' '-mrelocation-model' 'pic' '-pic-level' '2' '-pic-is-pie' '-mframe-pointer=all' '-ffp-contract=off' '-fno-rounding-math' '-mconstructor-aliases' '-funwind-tables=2' '-target-cpu' 'x86-64-v2' '-mllvm' '-x86-branches-within-32B-boundaries' '-tune-cpu' 'generic' '-mllvm' '-treat-scalable-fixed-error-as-warning' '-debug-info-kind=constructor' '-dwarf-version=5' '-debugger-tuning=gdb' '-mllvm' '-crash-diagnostics-dir=clang-crashreports' '-ffunction-sections' '-fdata-sections' '-fcoverage-compilation-dir=.' '-resource-dir' '../../../llvm-install/lib/clang/15.0.1' '-dependency-file' 'obj/src/storage/fshost/block-watcher.block-device-manager.cc.o.d' '-MT' 'obj/src/storage/fshost/block-watcher.block-device-manager.cc.o' '-sys-header-deps' '-D' '_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS' '-D' '_LIBCPP_REMOVE_TRANSITIVE_INCLUDES' '-D' '_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS=1' '-D' 'ZX_ASSERT_LEVEL=2' '-D' 'ALL_SOURCE' '-D' 'FIDL_TRACE_LEVEL=0' '-I' '../..' '-I' 'gen' '-I' 'obj' '-I' '../../sdk' '-I' 'gen/sdk' '-I' 'fidling/gen/sdk/fidl/fuchsia.inspect/fuchsia.inspect/hlcpp' '-I' '../../sdk/lib/fidl_base/include' '-I' 'gen/include' '-I' '../../src/zircon/lib/zircon/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.mem/fuchsia.mem/hlcpp' '-I' '../../sdk/lib/fit/include' '-I' '../../sdk/lib/stdcompat/include' '-I' '../../sdk/lib/fit-promise/include' '-I' '../../sdk/lib/fidl/include' '-I' '../../zircon/system/ulib/zx/include' '-I' '../../zircon/system/ulib/async/include' '-I' '../../zircon/system/ulib/async-default/include' '-I' '../../zircon/system/ulib/inspect/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/hlcpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/hlcpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.sys/fuchsia.sys/hlcpp' '-I' '../../sdk/lib/fdio/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.boot/fuchsia.boot/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/cpp' '-I' '../../sdk/lib/fidl/cpp/wire/include' '-I' '../../zircon/system/ulib/zxc/include' '-I' '../../zircon/system/ulib/sync/include' '-I' '../../zircon/system/ulib/fbl/include' '-I' '../../zircon/system/ulib/fzl/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.volume/fuchsia.hardware.block.volume/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block/fuchsia.hardware.block/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.io/fuchsia.io/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.unknown/fuchsia.unknown/c' '-I' 'fidling/gen/zircon/vdso/zx/zx/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.storage.metrics/fuchsia.storage.metrics/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.partition/fuchsia.hardware.block.partition/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.device/fuchsia.device/c' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.volume/fuchsia.hardware.block.volume/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block/fuchsia.hardware.block/cpp' '-I' '../../src/lib/fidl/cpp/include' '-I' 'x64-shared/gen/sdk' '-I' 'fidling/gen/sdk/fidl/fuchsia.storage.metrics/fuchsia.storage.metrics/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.partition/fuchsia.hardware.block.partition/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.device/fuchsia.device/cpp' '-I' 'fidling/gen/src/storage/fidl/fuchsia.fs.startup/fuchsia.fs.startup/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.fs/fuchsia.fs/cpp' '-I' '../../zircon/system/ulib/fidl-async/include' '-I' '../../zircon/system/ulib/trace/include' '-I' '../../zircon/system/ulib/trace-engine/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.feedback/fuchsia.feedback/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.math/fuchsia.math/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.mem/fuchsia.mem/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.fshost/fuchsia.fshost/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.process.lifecycle/fuchsia.process.lifecycle/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.ldsvc/fuchsia.ldsvc/cpp' '-I' 'fidling/gen/src/storage/fxfs/fuchsia.fxfs/cpp' '-I' '../../zircon/system/ulib/async-loop/include' '-I' '../../zircon/system/ulib/fdio-caller/include' '-I' '../../zircon/system/ulib/service/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.fs/fuchsia.fs/hlcpp' '-I' '../../zircon/system/public' '-I' '../../zircon/system/ulib/storage/buffer/include' '-I' '../../zircon/system/ulib/storage/operation/include' '-I' '../../src/lib/storage/block_client/cpp/include' '-I' '../../zircon/system/ulib/range/include' '-I' '../../zircon/system/ulib/storage-metrics/include' '-I' '../../src/storage/lib/disk_inspector/include' '-I' '../../src/storage/lib/watchdog/include' '-I' '../../zircon/system/ulib/syslog/include' '-I' '../../zircon/system/ulib/bitmap/include' '-I' '../../zircon/system/ulib/id_allocator/include' '-I' '../../zircon/third_party/ulib/safemath/include' '-I' 'fidling/gen/src/storage/blobfs/fuchsia.blobfs.internal/hlcpp' '-I' 'fidling/gen/src/storage/blobfs/fuchsia.blobfs.internal/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.blobfs/fuchsia.blobfs/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.device.manager/fuchsia.device.manager/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.driver.framework/fuchsia.driver.framework/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component/fuchsia.component/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component.decl/fuchsia.component.decl/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.data/fuchsia.data/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.url/fuchsia.url/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.process/fuchsia.process/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.component.runner/fuchsia.component.runner/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.diagnostics.types/fuchsia.diagnostics.types/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.driver.host/fuchsia.driver.host/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.power.statecontrol/fuchsia.hardware.power.statecontrol/cpp' '-I' 'fidling/gen/src/sys/pkg/fidl/fuchsia.update.verify/fuchsia.update.verify/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.encrypted/fuchsia.hardware.block.encrypted/cpp' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.block.verified/fuchsia.hardware.block.verified/cpp' '-I' '../../src/lib/storage/ramdevice_client/cpp/include' '-I' 'fidling/gen/sdk/fidl/fuchsia.hardware.nand/fuchsia.hardware.nand/c' '-I' '../../src/storage/gpt/include' '-I' '../../zircon/system/ulib/zircon-internal/include' '-I' '../../zircon/system/ulib/explicit-memory/include' '-D' 'FIDL_ALLOW_DEPRECATED_C_BINDINGS' '-D' 'FIDL_ALLOW_DEPRECATED_C_BINDINGS' '-isysroot' 'gen/zircon/public/sysroot/cpp' '-internal-isystem' '../../../llvm-install/bin/../include/x86_64-unknown-fuchsia/c++/v1' '-internal-isystem' '../../../llvm-install/bin/../include/c++/v1' '-internal-isystem' '../../../llvm-install/lib/clang/15.0.1/include' '-internal-externc-isystem' 'gen/zircon/public/sysroot/cpp/include' '-Os' '-ffuchsia-api-level=4294967295' '-std=c++17' '-fdeprecated-macro' '-fdebug-compilation-dir=.' '-ferror-limit' '19' '-fvisibility' 'hidden' '-fvisibility-inlines-hidden' '-fsanitize=safe-stack' '-stack-protector' '2' '-ftrivial-auto-var-init=pattern' '-fno-rtti' '-fgnuc-version=4.2.1' '-fcolor-diagnostics' '-vectorize-loops' '-vectorize-slp' '-fembed-bitcode=all' '-debug-info-kind=constructor' '-faddrsig' '-D' '__GCC_HAVE_DWARF2_CFI_ASM=1' '' '-x' 'ir' '/code/corpus/obj/src/storage/fshost/block-watcher.block-device-manager.cc.o.bc' '-mllvm' '-enable-ml-inliner=development' '-mllvm' '-training-log=/tmp/tmp6dd7o0lh/log' '-o' '/tmp/test.aa'
I get the error:
fatal error: error in backend: IO failure on output stream: Bad file descriptor
But I delete the '-mllvm' '-enable-ml-inliner=development' '-mllvm' '-training-log=/tmp/tmp6dd7o0lh/log' '-o' '/tmp/test.aa' and the command run successful.
I use the LLVM15, this is commit ID.
commit b73d2c8c720a8c8e6e73b11be4e27afa6cb75bdf (HEAD -> release/15.x, tag: llvmorg-15.0.1, origin/release/15.x)
Author: Florian Hahn [email protected]
Date: Mon Sep 19 18:14:34 2022 +0100
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.
At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.
This can lead to widened phis with incorrect start values being created
in the epilogue vector body.
This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization
Fixes #57712.
from ml-compiler-opt.
The reason you get Bad file descriptor
when trying to debug is that /tmp/tmp6dd7o0lh/log
doesn't exist (more specifically, the first part of the path, i.e. /tmp/tmp6dd7o0lh
- it's a tempfile
- created (from Python) directory. Try pointing -training-log
to output somewhere else, like /tmp/this_is_the.log
, i.e. under an existing dir.
Now for the first part. That seems to be about how the model passed to clang during training is invalid. I'm assuming you're at or near HEAD
of this (ml-compiler-opt) repo. Under your $OUTPUT_DIR
, do you see a bunch of saved model directories? You should see a policy
dir, under which you should see a bunch of numbered dirs. Pick one of the latter, under it you should see a saved_policy
and saved_collect_policy
. What do you see under it?
from ml-compiler-opt.
Yes, you are right, For Bad file descriptor problem, I use the /tmp/test.log and the command run successfully.
This is my $OUTPUT_DIR
This is my policy dir.
How I debug the python or C++ code to test train model ?
from ml-compiler-opt.
What happens if you use the same command line that works, and add -mllvm -ml-inliner-model-under-training=/code/model/policy/0/saved_collect_policy
from ml-compiler-opt.
It show Status: success !
2022-09-26 16:50:22.840280: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /code/model/policy/0/saved_collect_policy
2022-09-26 16:50:22.847266: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2022-09-26 16:50:22.860368: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-09-26 16:50:22.870020: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3700070000 Hz
2022-09-26 16:50:22.872757: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555d0f8a0370 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-09-26 16:50:22.872793: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2022-09-26 16:50:22.906265: I tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2022-09-26 16:50:22.957064: I tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy
2022-09-26 16:50:22.996176: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 155897 microseconds.
from ml-compiler-opt.
OK, and it compiles I assume. Hmm. Ah, I see what happens. You're building the compiler with the tensorflow C APIs, not tflite, right? We haven't updated the documentation yet - but here's how to build with tflite:
- make a directory somewhere, e.g.
/tmp/tflitebuild && cd /tmp/tflitebuild
- run
buildbot/build_tflite.sh
... (this takes a bit - itgit clone
s a bunch of repos and builds them) - notice a
/tmp/tflitebuild/tflite.cmake
was created - for your cmake (best to wipe out the build dir and re-issue cmake): instead of passing
-DTENSORFLOW_C_LIB_PATH
, pass-C /tmp/tflitebuild/tflite.cmake
.
That's it!
from ml-compiler-opt.
Updated now the demo - @boomanaiden154 had a PR open (#131 ) for a while and we forgot to merge. Sorry.
from ml-compiler-opt.
Thanks, I try it, I recompile my LLVM project and fushcia project. but the problem still happen. Maybe something I can check or debug in the code? I have no idea.
Command is:
rm -rf $OUTPUT_DIR &&
PYTHONPATH=$PYTHONPATH:. python3
compiler_opt/rl/train_locally.py
--root_dir=$OUTPUT_DIR
--data_path=$CORPUS
--gin_bindings=clang_path="'$LLVM_INSTALLDIR/bin/clang'"
--gin_bindings=llvm_size_path="'$LLVM_INSTALLDIR/bin/llvm-size'"
--gin_files=compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin
--gin_bindings=train_eval.warmstart_policy_dir="$WARMSTART_OUTPUT_DIR/saved_policy"
Log is:
Parameters for train_eval:
==============================================================================
train_eval.agent_name = %compiler_opt.rl.constant.AgentName.PPO
train_eval.batch_size = 256
train_eval.deploy_policy_name = 'saved_collect_policy'
train_eval.moving_average_decay_rate = 0.8
train_eval.num_iterations = 300
train_eval.num_modules = 100
train_eval.num_policy_iterations = 3000
train_eval.train_sequence_length = 16
train_eval.use_random_network_distillation = False
train_eval.warmstart_policy_dir = '/code/warmstart/saved_policy'
2022-09-26 17:53:44.895495: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-26 17:53:45.034631: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.034828: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035015: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035185: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035344: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.035499: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.625412: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.625633: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.625834: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626000: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626171: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10082 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:24:00.0, compute capability: 8.6
2022-09-26 17:53:45.626555: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-26 17:53:45.626697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 10188 MB memory: -> device: 1, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:2d:00.0, compute capability: 8.6
2022-09-26 17:53:46.251521: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:629] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
W0926 17:53:46.293635 140348014933824 ppo_agent.py:342] Only tf.keras.optimizers.Optimiers are well supported, got a non-TF2 optimizer: <tensorflow.python.training.adam.AdamOptimizer object at 0x7fa49a380970>
I0926 17:53:46.903522 140348014933824 common.py:1009] No checkpoint available at /code/model
I0926 17:53:47.646316 140348014933824 train_locally.py:101] Loading module specs from corpus at /code/corpus.
I0926 17:53:51.522883 140348014933824 train_locally.py:107] Done loading module specs from corpus.
I0926 17:53:52.110074 140348014933824 local_data_collector.py:73] prefetching took 0
I0926 17:53:52.122872 140348014933824 train_locally.py:152] Last iteration took: 0.012367
W0926 17:53:53.189572 140348014933824 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Deterministic_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_policy/assets
I0926 17:53:53.458599 140348014933824 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_policy/assets
2022-09-26 17:53:54.306021: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-26 17:53:54.306056: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-26 17:53:54.306634: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_policy
2022-09-26 17:53:54.308837: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-26 17:53:54.308854: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_policy
2022-09-26 17:53:54.314542: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:365] MLIR V1 optimization pass is not enabled
2022-09-26 17:53:54.315878: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-26 17:53:54.345653: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_policy
2022-09-26 17:53:54.365923: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 59290 microseconds.
2022-09-26 17:53:54.404422: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY
to enable.
2022-09-26 17:53:54.513431: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
W0926 17:53:55.616633 140348014933824 save.py:271] Found untraced functions such as ActorDistributionNetwork_layer_call_fn, ActorDistributionNetwork_layer_call_and_return_conditional_losses, ConstantValueNetwork_layer_call_fn, ConstantValueNetwork_layer_call_and_return_conditional_losses, EncodingNetwork_layer_call_fn while saving (showing 5 of 92). These functions will not be directly callable after loading.
/root/.local/lib/python3.8/site-packages/tensorflow/python/saved_model/nested_structure_coder.py:521: UserWarning: Encoding a StructuredValue with type tfp.distributions.Categorical_ACTTypeSpec; loading this StructuredValue will require that this type be imported and registered.
warnings.warn("Encoding a StructuredValue with type %s; loading this "
INFO:tensorflow:Assets written to: /code/model/policy/0/saved_collect_policy/assets
I0926 17:53:55.860256 140348014933824 builder_impl.py:779] Assets written to: /code/model/policy/0/saved_collect_policy/assets
2022-09-26 17:53:56.730252: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-09-26 17:53:56.730288: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-09-26 17:53:56.730407: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /code/model/policy/0/saved_collect_policy
2022-09-26 17:53:56.732630: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-09-26 17:53:56.732646: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /code/model/policy/0/saved_collect_policy
2022-09-26 17:53:56.737559: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-09-26 17:53:56.766411: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: /code/model/policy/0/saved_collect_policy
2022-09-26 17:53:56.786771: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 56365 microseconds.
2022-09-26 17:53:56.948630: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2078] Estimated count of arithmetic ops: 0.011 M ops, equivalently 0.005 M MACs
I0926 17:53:57.091879 140348014933824 local_data_collector.py:134] resolving prefetched sample took: 0 seconds
I0926 17:53:57.092738 140348014933824 local_data_collector.py:73] prefetching took 0
I0926 17:53:57.092979 140348014933824 local_data_collector.py:91] Waiting for pending work from last iteration took 0.000001
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpbxfbj34s/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpx5fxbs2c/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpm2gq9m6x/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpa_m6lua9/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpiec123gk/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpjfas9ul9/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpx83gfm17/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
3 errors generated.
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpd6fx2uw3/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmpqer841ol/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
Could not find SavedModel .pb or .pbtxt at supplied export directory path: /tmp/tmp56v46hye/policyCould not find TF_Output named: StatefulPartitionedCallerror: Failed to create saved model evaluator
error: Could not load or create model evaluator.
error: Could not setup Inlining Advisor for the requested mode and/or options
3 errors generated.
from ml-compiler-opt.
I am not delete my llvm-project build directory, just directly use the cmake command to recompile and use ninja to build. Maybe it will have problem. I delete it and try again.
from ml-compiler-opt.
You may need to delete the build directory and then re-create it and re-issue the correct (new) cmake command. After that, and after rebuilding clang, try out the one clang invocation we tried in isolation (the one that included the path to the training model)
from ml-compiler-opt.
OK,Thanks!!
from ml-compiler-opt.
Related Issues (20)
- 【Question】How to use GPU training, just install tensorflow-gpu? will there be better performance if using a larger model? HOT 1
- 【Question】Why use llvm-size to calculate rewards? llvm also calculates size rewards? HOT 2
- 【Question】Can you open the code of ES algorithm? HOT 2
- 【Question】What parameters need to be passed in to compile the data set? -Oz -Xclang -fembed-bitcode=all? HOT 2
- How to train a model using bin's llvmbc and llvmcmd segments?I want to optimize directly using the executable program HOT 6
- Why can’t I use llvmbc and llvmcmd of executable programs?
- questions about feature log HOT 1
- Is it not very accurate to use the size reward of the entire file as the reward for each caller-callee feature, if the file is large and has a large number of caller-callee? HOT 1
- What does the size of sequence_examples depend on, and how to set its size? HOT 5
- Does llvm-15.04 support mlgo? What versions of tensorflow and other libraries are needed? HOT 1
- Why is the length of the reward limited to 3 or more? HOT 3
- How to know the effect of model inlining when training the model? HOT 1
- how to get model.tflite file from inlining-Oz-99f0063-v1.1.tar.gz HOT 1
- Why “-static” affects the test results of the model HOT 2
- why need to calculate reward_stat? I see llvm_trainer.train use reward from sequence_example.reward HOT 1
- Can I merge all the bc files into a total bc file for training?
- How to compile other dataset using llvm's thinlto flag?
- Where do I find pretrained models for MLGOPerf? HOT 4
- `--compile_task` flag missing HOT 2
- [non-issue] MLGO Questions HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml-compiler-opt.