google / ruy Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
I am trying to figure out the whole flow of ruy.
I use example at below:
const float lhs_data[] = {1, 2, 3, 4, 5 ,6, 1, 2, 3, 4, 5, 6};
const float rhs_data[] = {1, 2, 3, 4, 5, 6};
float dst_data[8];
ruy::Matrix<float> lhs;
ruy::MakeSimpleLayout(4, 3, ruy::Order::kRowMajor, lhs.mutable_layout());
lhs.set_data(lhs_data);
ruy::Matrix<float> rhs;
ruy::MakeSimpleLayout(3, 2, ruy::Order::kColMajor, rhs.mutable_layout());
rhs.set_data(rhs_data);
ruy::Matrix<float> dst;
ruy::MakeSimpleLayout(4, 2, ruy::Order::kColMajor, dst.mutable_layout());
dst.set_data(dst_data);
I have questions between rows/cols & order.
On my case
<style type="text/css"></style>
I run on arm_v8, so the FixedKernel is 1, 8, row-major
packed_matrix = {elem_ = {{data_type = {is_signed = true, is_floating_point = true, size = 4 '\004'},
data = 0x0, sums_type = {is_signed = true, is_floating_point = true, size = 4 '\004'}, sums = 0x0,
layout = {rows = 3, cols = 8, stride = 3, order = ruy::Order::kColMajor, kernel = {
order = ruy::Order::kRowMajor, rows = 1 '\001', cols = 8 '\b'}}, zero_point = 0}, {data_type = {
is_signed = true, is_floating_point = true, size = 4 '\004'}, data = 0x0, sums_type = {is_signed = true,
is_floating_point = true, size = 4 '\004'}, sums = 0x0, layout = {rows = 3, cols = 8, stride = 3,
order = ruy::Order::kColMajor, kernel = {order = ruy::Order::kRowMajor, rows = 1 '\001',
cols = 8 '\b'}}, zero_point = 0}}}, is_prepacked = {elem_ = {false, false}},
mul_params_bytes = "\300\206H", '\000' <repeats 11 times>
SRC | Dst | Packed_matrix | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LHS | RHS | Dst | LHS | RHS | |||||||||||||||
rows | cols | stride | order | rows | cols | stride | order | rows | cols | stride | order | rows | cols | stride | order | rows | cols | stride | order |
3 | 4 | 3 | C | 3 | 2 | 3 | C | 4 | 2 | 4 | C | 3 | 8 | 3 | C | 3 | 8 | 3 | C |
I've gone through the build process: bazel build :all and looks like it completed successfully. Now, though, how do I install the output of the build? I was expecting something like a .so file, but I don't see one so I'm not sure how to install? I don't see a bazel "install" option...
I'm seeing a strange compiler error when building the face_mesh_cpu example from mediapipe using VS2019.
ERROR: C:/users/will/_bazel_will/mvh33bjd/external/ruy/ruy/BUILD:295:11: Compiling ruy/block_map.cc failed: (Exit 2): cl.exe failed: error executing command C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29910/bin/HostX64/x64/cl.exe /nologo /DCOMPILER_MSVC /DNOMINMAX /D_WIN32_WINNT=0x0601 /D_CRT_SECURE_NO_DEPRECATE ... (remaining 28 argument(s) skipped)
cl : Command line warning D9002 : ignoring unknown option '-O3'
external/ruy/ruy/block_map.cc(334): error C2059: syntax error: ')'
external/ruy/ruy/block_map.cc(334): error C2676: binary '==': 'const ruy::CpuCacheParams' does not define this operator or a conversion to a type acceptable to the predefined operator
At block_map.cc:334, if I remove the newline before cpu_cache_params
and put the return statement all on one line, the code compiles fine. Why on earth would whitespace cause this compilation error to happen?
For reference, the command used to build the mediapipe example is bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 --action_env PYTHON_BIN_PATH="[path to python3.exe]" //mediapipe/examples/desktop/face_mesh:face_mesh_cpu
I've built the TensorFlow Lite C API as static lib.
I need to link with sub-dependend libs too, one of which is Ruy.
However, there are ~30 .a
libs for Ruy.
Do I need to link with all of those? What to do?
Take matmul for example, will ruy bind each kernel with specific data format?
The knowledge I know is that, the LHS has transpose to COL Major. So both LHS, RHS and Dest are COL-Major.
The LHS and RHS should be packed into a specific memory layout to accelerate computation.
So what's the exactly memory layout for a specific kernel?
I have two matrices: A=[[1,2], [3,4]] B=[[1,3],[2,4]], but the result AB is zero matrix [[0,0], [0,0]].
I don't know how ruy::Mul
works. Is there any avalibe information? Thanks!!
void ExampleMulInt8PerChannelQuantized(ruy::Context *context) {
const std::int8_t lhs_data[] = {1, 2, 3, 4};
const std::int8_t rhs_data[] = {1, 2, 3, 4};
std::int8_t dst_data[4];
ruy::Matrix<std::int8_t> lhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kRowMajor, lhs.mutable_layout());
lhs.set_data(lhs_data);
ruy::Matrix<std::int8_t> rhs;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, rhs.mutable_layout());
rhs.set_data(rhs_data);
ruy::Matrix<std::int8_t> dst;
ruy::MakeSimpleLayout(2, 2, ruy::Order::kColMajor, dst.mutable_layout());
dst.set_data(dst_data);
ruy::MulParams<std::int32_t, std::int8_t> mul_params;
ruy::Mul(lhs, rhs, mul_params, context, &dst);
std::cout << "Example Mul, int8 quantized with per-channel multipliers\n";
std::cout << "LHS:\n" << lhs;
std::cout << "RHS:\n" << rhs;
std::cout << "Result:\n" << dst << "\n";
}
Hi,
I wonder if the kernels for aarch64 architecture use NEON instructions? The assembly code in https://github.com/google/ruy/blob/master/ruy/kernel_arm64.cc doesn't have NEON instructions like VADD
or VMUL
. How is vectorization performed for 64-bit arm architectures?
Using pycoral's build.sh
Compile command:
/usr/bin/arm-linux-gnueabihf-gcc -fPIC -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-march=armv7-a' '-mfpu=neon-vfpv4' -g0 -O3 -DNDEBUG '-D_FORTIFY_SOURCE=2' -ffunction-sections -fdata-sections -funsafe-math-optimizations -ftree-vectorize '-std=c++14' -MD -MF bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.d '-frandom-seed=bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.o' -iquote external/ruy -iquote bazel-out/armv7a-opt/bin/external/ruy -iquote external/cpuinfo -iquote bazel-out/armv7a-opt/bin/external/cpuinfo -iquote external/clog -iquote bazel-out/armv7a-opt/bin/external/clog -Ibazel-out/armv7a-opt/bin/external/cpuinfo/_virtual_includes/cpuinfo -Ibazel-out/armv7a-opt/bin/external/clog/_virtual_includes/clog '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION' '-ffp-contract=off' -Wall -Wextra -Wc++14-compat -Wundef '-mfpu=neon' -O3 -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/ruy/ruy/pack_arm.cc -o bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.o)
Error:
external/ruy/ruy/pack_arm.cc:469:72: error: 'asm' operand has impossible constraints
(line) 469 | "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", "q12", "q13");
Hi,
I am using ruy
as a dependency of tensorflow
. And their new quantized conv2d implementation relies on ruy::TrMul
. The issue is that ruy
does not activate X86_ENHANCEMENTS
on MacOS by default. And when I tried forcing it with -DRUY_FORCE_ENABLE_X86_ENHANCEMENTS
, it runs faster, but the output is wrong. A similar result can be observed if I run with RUY_PATHS=0x20
environment variable (suggested by @talumbau in a tensorflow issue).
My computer is MacBook Pro 2018 having Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
(ark page), hence it supports AVX2.
platform.h
has a comment about disabling it on Apple, but I cannot access the mentioned comment under b/138922878
. My question is that: Is it possible to build tensorflow
with AVX instructions enabled for the ruy
backend on Apple?
Please add support for cmake install.
This will allow easier packaging of this library with Conan.
would you consider to add support for reenterable task?
currently, the tasks are not reenterable, in ThreadPool::ExecuteImpl,each can run in one thread;
_void ThreadPool::ExecuteImpl(int task_count, int stride, Task* tasks) ;_
if use reenterable task with atomic variable, maybe higher performance, the overhead is light, because there is only one task. _the code:
`void ThreadPool::ExecuteReenterableImpl(int thread_count, Task* reenterableTask) {
_RUY_DCHECK_GE(thread_count, 1);
// Case of 1 thread: just run the single task on the current thread.
if (thread_count == 1) {
(reenterableTask)->Run();
return;
}
// Task #0 will be run on the current thread.
CreateThreads(thread_count- 1);
counter_to_decrement_when_ready_.Reset(thread_count- 1);
for (int i = 1; i < thread_count; i++) {
auto task_address = reinterpret_caststd::uintptr_t( reenterableTask) ;//+ i * stride;
threads_[i - 1]->StartWork(reinterpret_cast<Task*>(task_address));
}
// Execute task #0 immediately on the current thread.
(reenterableTask )->Run();
// Wait for the threads submitted above to finish.
counter_to_decrement_when_ready_.Wait(spin_duration_);
}___`
Hi. Is there any documentation for ruy somewhere?
I'm having trouble understanding how the ruy works. I'm trying to compare the performance of different GEMM libraries (like ruy) on mobile devices using tflite, but I'm having trouble understanding ruy and how to replace it.
Can you point me to any documentation or any guide for ruy?
Trying to build Chromium with NEON on Raspberry pi 4 with Yocto and GCC (using mcpu=cortex-a7, mfpu=neon-vfpv4, mthumb), compilation of ruy fails:
arm-poky-linux-gnueabi-g++ -mthumb -mfpu=neon-vfpv4 -mfloat-abi=hard -mcpu=cortex-a7 -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security -Wdate-time --sysroot=/home/dape/Development/rpi/poky-warrior/build/tmp/work/cortexa7t2h
f-neon-vfpv4-poky-linux-gnueabi/chromium-dev/97.0.4682.3-r0/recipe-sysroot -MMD -MF obj/third_party/ruy/ruy/pack_arm.o.d -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_OZONE=1 -DUSE_X11=1 -DOFFICIAL_BUILD -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFI
LE64_SOURCE -DNO_UNWIND_TABLES -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -I../chromium-97.0.4682.3 -Igen -I../chromium-97.0.4682.3/third_party/ruy/src -fno-ident -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -fno-unwind-tables -fno-asynchrono
us-unwind-tables -fPIC -pipe -pthread -march=armv7ve -mfloat-abi=hard -mtune=generic-armv7-a -mfpu=neon -mthumb -O2 -fdata-sections -ffunction-sections -fno-omit-frame-pointer -g1 -fvisibility=hidden -Wno-inline-asm -Wno-psabi -Wno-unused-local-typedefs -Wno-maybe-uniniti
alized -Wno-deprecated-declarations -Wno-comments -Wno-packed-not-aligned -Wno-missing-field-initializers -Wno-unused-parameter -std=gnu++14 -fno-exceptions -fno-rtti -fvisibility-inlines-hidden -Wno-narrowing -Wno-class-memaccess -feliminate-unused-debug-types -fmacro-
prefix-map=/home/dape/Development/rpi/poky-warrior/build/tmp/work/cortexa7t2hf-neon-vfpv4-poky-linux-gnueabi/chromium-dev/97.0.4682.3-r0=/usr/src/debug/chromium-dev/97.0.4682.3-r0 -fdebug-prefix-map=/home/dape/Development/rpi/poky-warrior/build/tmp/wo
rk/cortexa7t2hf-neon-vfpv4-poky-linux-gnueabi/chromium-dev/97.0.4682.3-r0=/usr/src/debug/chromium-dev/97.0.4682.3-r0 -fdebug-prefix-map=/home/dape/Development/rpi/poky-warrior/build/tmp/work/cortexa7t2hf-neon-vfpv4-poky-linux-gnueabi/chromium-dev/97.0
.4682.3-r0/recipe-sysroot= -fdebug-prefix-map=/home/dape/Development/rpi/poky-warrior/build/tmp/work/cortexa7t2hf-neon-vfpv4-poky-linux-gnueabi/chromium-dev/97.0.4682.3-r0/recipe-sysroot-native= -fvisibility-inlines-hidden -c ../chromium-97.0.4682.3/
third_party/ruy/src/ruy/pack_arm.cc -o obj/third_party/ruy/ruy/pack_arm.o
../chromium-97.0.4682.3/third_party/ruy/src/ruy/pack_arm.cc: In function 'void ruy::Pack8bitColMajorForNeon4Cols(const ruy::PackParams8bit&)':
../chromium-97.0.4682.3/third_party/ruy/src/ruy/pack_arm.cc:264:3: error: 'asm' operand has impossible constraints
264 | asm volatile(
| ^~~
At global scope:
According to the CMake documentation, cmake_minimum_required
needs to be called before the first call to project()
(see notes here: https://cmake.org/cmake/help/latest/command/cmake_minimum_required.html, and at the bottom of this page: https://cmake.org/cmake/help/latest/command/project.html)
This can cause problems if users wish to use CMake functionality like setting their own policy defaults, or if code inside CMake toolchain files (e.g. when cross-building) or using project code injection is used (https://cmake.org/cmake/help/latest/command/project.html#code-injection). Both are useful to set up C++ package managers to provide dependencies.
Is there any documents about the design of Ruy? Maybe it can help us to understand the source code.
Thanks in advance.
Since class Matrix
is templatized
template <typename Scalar>
class Matrix final {
......
private:
...........
// The zero_point, i.e. which Scalar value is to be interpreted as zero.
// When Scalar is floating-point, this must be 0.
Scalar zero_point_ = 0;
};
I could have something like: Matrix<Eigen::half> myMatrix;
but then at compilation I get error: no viable conversion from 'int' to 'Eigen::half' Scalar zero_point_ = 0;
since the zero isn't templatized
integer 0 and float zero are interchangeable so the above code works; but it's not the generic case for templates. I believe an implementation like Scalar zero_point_ = Scalar{0};
is more generic
Similar fixes to other classes and parts of the code?
Hi.
I was looking for a performance comparison between ruy
and OpenBLAS
and I came across this.
But when I benchmark the ruy (almost for any shape with single thread execution and on raspberry pi 4), my results are far behind the reported results.
For example, for the 512x512x512 Int8 benchmark, I can only get ~10 GOPs but excel reported 40 GOPs.
I know Raspberry Pi 4 CPU frequency can be maxed out to 1.5 GHz while Pixel 4 max frequency is 2.84 GHz, but it does not justify the 30 GOPs gap.
So I thought it might be better to ask it here.
How did you measure GOPs for ruy?
I calculate the GOPs for the method with the ((2 * N * K * M * iterations) / time) / 10e+9
formula (time
is the sum of the execution time of ruy::Mul
for each iteration) (I pack the RHS matrix beforehand).
Am I doing anything wrong?
When I am trying to build benchmark with the command:
bazel --output_user_root=$build_dir build -c dbg --copt=-march=native //ruy:benchmark_f32_f32_f32_f32
It fails with the error:
./ruy/test.h:166: error: undefined reference to 'std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream()'
./ruy/test.h:166: error: undefined reference to 'std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream()'
./ruy/test.h:2210: error: undefined reference to 'std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream()'
collect2: error: ld returned 1 exit status
The above commands works perfect with GCC8.3
I upgrade to GCC9.3 to use the AVX512 as defined in:
Lines 117 to 121 in be065e4
BTW: I am using bazel 3.1
Is there any performance data about ruy?
Compare with gemmlowp or other optimize lib. I see that tflite use gemmlowp and ruy both. Which is better in performance.
I'm testing the performance of Eigen vs ruy on an intel machine and on a raspberry pi, and in my benchmarking tests I consistently get Eigen to perform much faster than ruy. Is there something I'm doing wrong in these benchmarks.
I'm pasting my test code below :
class RuyMultiplier {
public:
RuyMultiplier(size_t stateSize, size_t outputSize, const std::vector<float>& weightData, int numThreads)
:_weight(weightData) {
context.set_max_num_threads(numThreads);
ruy::MakeSimpleLayout(1, stateSize, ruy::Order::kColMajor, A.mutable_layout());
ruy::MakeSimpleLayout(outputSize, stateSize, ruy::Order::kColMajor, B.mutable_layout());
ruy::MakeSimpleLayout(1, outputSize, ruy::Order::kColMajor, C.mutable_layout());
B.set_data(_weight.data());
B.set_cache_policy(ruy::CachePolicy::kAlwaysCache);
}
void multiply(const float* state, std::vector<float>& output) {
A.set_data(state);
C.set_data(output.data());
ruy::Mul(A, B, mul_params, &context, &C);
}
private:
std::vector<float> _weight;
ruy::Matrix<float> A;
ruy::Matrix<float> B;
ruy::Matrix<float> C;
ruy::MulParams<float, float> mul_params;
ruy::Context context;
};
My function for benchmarking (using google benchmark) is:
static void RuyBenchmark(benchmark::State& state) {
std::random_device random_device;
auto rng = std::mt19937(random_device());
auto f32rng = std::bind(std::uniform_real_distribution<float>(-1.0f, +1.0f), std::ref(rng));
size_t inputSize = state.range(0);
size_t outputSize = state.range(1);
std::vector<float> weight(inputSize * outputSize);
std::generate(weight.begin(), weight.end(), std::ref(f32rng));
RuyMultiplier testMul(inputSize, outputSize, weight, 4);
std::vector<float> input(inputSize, 0.0f);
std::vector<float> output(outputSize);
float sampleValue(.0f);
for (auto _ : state) {
std::generate(input.begin(), input.end(), std::ref(f32rng));
testMul.multiply(input.data(), output);
sampleValue = output[0];
}
}
The Eigen setup is similar but i get at much faster results. In the sample run here, my input vector has size 80, and the output vector has size 128 (so a 1x80x128 multiplication .. or a 128 x 80 x 1 multiplication, depending on whether i treat is as row major or column major)
-----------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------
RuyBenchmark/I:80/H:128/process_time/real_time 62.8 us 62.9 us 10469
EigenBenchmark/I:80/H:128/process_time/real_time 1.66 us 1.66 us 416765
This repo currently contains compile-time errors.
Line 58 in 600d1ec
According to https://en.cppreference.com/w/cpp/container/vector/emplace_back
emplace_back
returns void
until c++17. So you cannot call get
on a void
type.
There is no instruction in the repo saying the project requires c++17 support.
string.h
containing memcpy
is not included which results in again a compile-time error for the following coderuy/ruy/profiler/instrumentation.cc
Line 111 in 600d1ec
profiler
is no longer called profiling
; the readme.md
profiling
.ruy/ruy/profiler/instrumentation.h
Line 26 in 600d1ec
Fuller investigation has been documented on tensorflow/tensorflow#39509.
It looks like q7
was removed from the clearage in tensorflow/tensorflow@2359c4e#diff-ca44636122d5fd4fe9600903ebf461b9L665.
I honestly don't know why it would expose this behavior only under NodeJS and only on ARMv7 platform, but re-instating q7
as in tensorflow/tensorflow#39951 fixes the issue.
Since q7
is cleared at other places q6-q15
are cleared, and since there's no specific comment regarding the removal of q7
at this place, is it possible it's just a slight typo and I have been lucky in finding it?
Are there any reliable benchmark results comparing ruy with other GEMM libraries such as gemmlowp and Eigen? I am really interested in this (in the context of Tensorflow Lite performance), but the only tiny piece of information I found so far is a blog post in Tensorflow blog mentioning that TF Lite with ruy enabled outperforming regular TF Lite (Better CPU performance
section) when inferring on a single CPU core.
Following the TF issue, we found the code here, calling the memalign as SystemAlignedAlloc without any assert, is a little bit dangerous and really unfriendly for developers to debug, if some phones have no warnings as W/libc: memalign(64, 411042816) failed: returning null pointer
by their Android, we will be crazy. BTW, memory alignment is factually a good way for efficiency, but we may need to consider about some other ways for a big memory alignment problem. Looking for your reply, thx.
Hi, I'm trying to build PyCoral with Tensorflow 2.7.0 for arm7a and it fails as follows:
$ git clone https://github.com/oberluz/pycoral.git
$ git checkout 2_7_0
$ git submodule update --init --recursive
$ DOCKER_CPUS="armv7a" ./scripts/build.sh --python_versions "310"
...
(cd /home/eyeot-demo/.cache/bazel/_bazel_eyeot-demo/eab0d61a99b6696edb3d2aff87b585e8/execroot/pycoral && \
exec env - \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
PWD=/proc/self/cwd \
/usr/bin/arm-linux-gnueabihf-gcc -fPIC -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-march=armv7-a' '-mfpu=neon-vfpv4' -g0 -O3 -DNDEBUG '-D_FORTIFY_SOURCE=2' -ffunction-sections -fdata-sections -funsafe-math-optimizations -ftree-vectorize '-std=c++17' -MD -MF bazel-out/armv7a-opt/bin/external/org_tensorflow/tensorflow/lite/_objs/minimal_logging/minimal_logging_default.d '-frandom-seed=bazel-out/armv7a-opt/bin/external/org_tensorflow/tensorflow/lite/_objs/minimal_logging/minimal_logging_default.o' -iquote external/org_tensorflow -iquote bazel-out/armv7a-opt/bin/external/org_tensorflow '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION' '-ffp-contract=off' -Wall -DFARMHASH_NO_CXX_STRING -Wno-sign-compare -O3 -fno-exceptions -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/org_tensorflow/tensorflow/lite/minimal_logging_default.cc -o bazel-out/armv7a-opt/bin/external/org_tensorflow/tensorflow/lite/_objs/minimal_logging/minimal_logging_default.o)
INFO: From Compiling tensorflow/lite/minimal_logging_default.cc:
external/org_tensorflow/tensorflow/lite/minimal_logging_default.cc:28: warning: ignoring '#pragma clang diagnostic' [-Wunknown-pragmas]
28 | #pragma clang diagnostic push
|
external/org_tensorflow/tensorflow/lite/minimal_logging_default.cc:29: warning: ignoring '#pragma clang diagnostic' [-Wunknown-pragmas]
29 | #pragma clang diagnostic ignored "-Wformat-nonliteral"
|
external/org_tensorflow/tensorflow/lite/minimal_logging_default.cc:31: warning: ignoring '#pragma clang diagnostic' [-Wunknown-pragmas]
31 | #pragma clang diagnostic pop
|
SUBCOMMAND: # @org_tensorflow//tensorflow/lite:minimal_logging [action 'Linking external/org_tensorflow/tensorflow/lite/libminimal_logging.a', configuration: 11cae9684ea823b3911201bf8ead584031fd2c07b224432188557a50a44d29ae, execution platform: @local_execution_config_platform//:platform]
(cd /home/eyeot-demo/.cache/bazel/_bazel_eyeot-demo/eab0d61a99b6696edb3d2aff87b585e8/execroot/pycoral && \
exec env - \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
PWD=/proc/self/cwd \
/usr/bin/arm-linux-gnueabihf-ar @bazel-out/armv7a-opt/bin/external/org_tensorflow/tensorflow/lite/libminimal_logging.a-2.params)
SUBCOMMAND: # @com_google_absl//absl/time/internal/cctz:civil_time [action 'Compiling absl/time/internal/cctz/src/civil_time_detail.cc', configuration: 11cae9684ea823b3911201bf8ead584031fd2c07b224432188557a50a44d29ae, execution platform: @local_execution_config_platform//:platform]
(cd /home/eyeot-demo/.cache/bazel/_bazel_eyeot-demo/eab0d61a99b6696edb3d2aff87b585e8/execroot/pycoral && \
exec env - \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
PWD=/proc/self/cwd \
/usr/bin/arm-linux-gnueabihf-gcc -fPIC -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-march=armv7-a' '-mfpu=neon-vfpv4' -g0 -O3 -DNDEBUG '-D_FORTIFY_SOURCE=2' -ffunction-sections -fdata-sections -funsafe-math-optimizations -ftree-vectorize '-std=c++17' -MD -MF bazel-out/armv7a-opt/bin/external/com_google_absl/absl/time/internal/cctz/_objs/civil_time/civil_time_detail.d '-frandom-seed=bazel-out/armv7a-opt/bin/external/com_google_absl/absl/time/internal/cctz/_objs/civil_time/civil_time_detail.o' -iquote external/com_google_absl -iquote bazel-out/armv7a-opt/bin/external/com_google_absl '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION' '-ffp-contract=off' -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/com_google_absl/absl/time/internal/cctz/src/civil_time_detail.cc -o bazel-out/armv7a-opt/bin/external/com_google_absl/absl/time/internal/cctz/_objs/civil_time/civil_time_detail.o)
ERROR: /home/eyeot-demo/.cache/bazel/_bazel_eyeot-demo/eab0d61a99b6696edb3d2aff87b585e8/external/ruy/ruy/BUILD:585:11: Compiling ruy/pack_arm.cc failed: (Exit 1): arm-linux-gnueabihf-gcc failed: error executing command
(cd /home/eyeot-demo/.cache/bazel/_bazel_eyeot-demo/eab0d61a99b6696edb3d2aff87b585e8/sandbox/processwrapper-sandbox/375/execroot/pycoral && \
exec env - \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
PWD=/proc/self/cwd \
/usr/bin/arm-linux-gnueabihf-gcc -fPIC -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-march=armv7-a' '-mfpu=neon-vfpv4' -g0 -O3 -DNDEBUG '-D_FORTIFY_SOURCE=2' -ffunction-sections -fdata-sections -funsafe-math-optimizations -ftree-vectorize '-std=c++17' -MD -MF bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.d '-frandom-seed=bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.o' -iquote external/ruy -iquote bazel-out/armv7a-opt/bin/external/ruy -iquote external/cpuinfo -iquote bazel-out/armv7a-opt/bin/external/cpuinfo -iquote external/clog -iquote bazel-out/armv7a-opt/bin/external/clog -Ibazel-out/armv7a-opt/bin/external/cpuinfo/_virtual_includes/cpuinfo -Ibazel-out/armv7a-opt/bin/external/clog/_virtual_includes/clog '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION' '-ffp-contract=off' -Wall -Wextra -Wc++14-compat -Wundef '-mfpu=neon' -O3 -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/ruy/ruy/pack_arm.cc -o bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.o)
Execution platform: @local_execution_config_platform//:platform
Use --sandbox_debug to see verbose messages from the sandbox arm-linux-gnueabihf-gcc failed: error executing command
(cd /home/eyeot-demo/.cache/bazel/_bazel_eyeot-demo/eab0d61a99b6696edb3d2aff87b585e8/sandbox/processwrapper-sandbox/375/execroot/pycoral && \
exec env - \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
PWD=/proc/self/cwd \
/usr/bin/arm-linux-gnueabihf-gcc -fPIC -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer '-march=armv7-a' '-mfpu=neon-vfpv4' -g0 -O3 -DNDEBUG '-D_FORTIFY_SOURCE=2' -ffunction-sections -fdata-sections -funsafe-math-optimizations -ftree-vectorize '-std=c++17' -MD -MF bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.d '-frandom-seed=bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.o' -iquote external/ruy -iquote bazel-out/armv7a-opt/bin/external/ruy -iquote external/cpuinfo -iquote bazel-out/armv7a-opt/bin/external/cpuinfo -iquote external/clog -iquote bazel-out/armv7a-opt/bin/external/clog -Ibazel-out/armv7a-opt/bin/external/cpuinfo/_virtual_includes/cpuinfo -Ibazel-out/armv7a-opt/bin/external/clog/_virtual_includes/clog '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION' '-ffp-contract=off' -Wall -Wextra -Wc++14-compat -Wundef '-mfpu=neon' -O3 -no-canonical-prefixes -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/ruy/ruy/pack_arm.cc -o bazel-out/armv7a-opt/bin/external/ruy/ruy/_objs/pack_arm/pack_arm.o)
Execution platform: @local_execution_config_platform//:platform
Use --sandbox_debug to see verbose messages from the sandbox
In file included from external/ruy/ruy/pack_arm.cc:16:
external/ruy/ruy/pack_arm.h:492:9: warning: multi-line comment [-Wcomment]
492 | #endif // (RUY_PLATFORM_NEON_64 || RUY_PLATFORM_NEON_32) && \
| ^
external/ruy/ruy/pack_arm.cc: In function 'void ruy::Pack8bitColMajorForNeon4Cols(const ruy::PackParams8bit&)':
external/ruy/ruy/pack_arm.cc:264:3: error: 'asm' operand has impossible constraints
264 | asm volatile(
| ^~~
Target //src:_pywrap_coral failed to build
INFO: Elapsed time: 234.588s, Critical Path: 88.61s
INFO: 576 processes: 204 internal, 372 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
make: *** [Makefile:152: pybind] Error 1
make: Leaving directory '/workspace'
Building for other versions of python (36 37 38 39, see scripts/build.sh) words. But it fails for 310 which uses ubuntu 22.04
Any ideas?
Hi ! Recently, I focus on Performance Profiling of Android and learned that PMU could record some useful information about cache, instruction, memory and so on.
I have a question that should I compile linux kernel code with CONFIG_HW_PERF_EVENTS=ON / CONFIG_ARM_SPE_PMU=ON, if I want to get PMU work?
Because the toolchains inside Google are set up to ignore many warnings, Google-owned projects tend to compile with many warnings for opensource users. Ruy is currently no exception. We should fix that, at least for some recent enough Clang and GCC versions, either by changing code or by adding warning-disabling flags to ruy_copts.
I would like to determine which multiplier fixedpoint and exponent to use for multiplication with a particular scale, and the QuantizeMultiplier function seems to be exactly what I need. But I noticed it is located in test.h and not somewhere better for public exposure.
What is the proper way for me to determine multiplier fixedpoints and exponents? If it is to call QuantizeMultiplier, should QuantizeMultiplier be moved out of test.h or be given some public API?
Maybe I'm missing something, but is there support for C += AB as opposed to C = AB? Trying to make a full sgemm
replacement, which would mainly be useful for fine tuning models on device.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.