Comments (6)
You can use bf-cat to repeatedly download that blob to see if there's something in bazel itself causing this problem, or if it implicates the cluster conditions for reporting bad content (bf-cat ... File 2b0c7b8839bcb322410449916521fab0fd21d1e51558ffa38fb60f2baaf83e47/18563764 | sha256sum
on repeat should always yield 2b0c7b8839bcb322410449916521fab0fd21d1e51558ffa38fb60f2baaf83e47)
from bazel-buildfarm.
On repeat it yields differing results between:
2b0c7b8839bcb322410449916521fab0fd21d1e51558ffa38fb60f2baaf83e47/18563764
74739361f7c558e5718aa8ce7cda96063cc17cb75d3b8a1bedd1b4118e948050/357
Does this suggest that both are stored and it depends which worker it is coming from?
Local run responses:
cjohnstoniv@Desktop:/helm-farm/bazel-buildfarm$ bazel-bin/src/main/java/build/buildfarm/tools/bf-cat localhost:8080 "" SHA256 File 2b0c7b8839bcb322410449916521fab0fd21d1e51558ffa38fb60f2baaf83e47/18563764 | sha256sum/helm-farm/bazel-buildfarm$ bazel-bin/src/main/java/build/buildfarm/tools/bf-cat localhost:8080 "" SHA256 File 2b0c7b8839bcb322410449916521fab0fd21d1e51558ffa38fb60f2baaf83e47/18563764 | sha256sum
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
74739361f7c558e5718aa8ce7cda96063cc17cb75d3b8a1bedd1b4118e948050 -
cjohnstoniv@Desktop:
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
74739361f7c558e5718aa8ce7cda96063cc17cb75d3b8a1bedd1b4118e948050 -
cjohnstoniv@Desktop:~/helm-farm/bazel-buildfarm$ bazel-bin/src/main/java/build/buildfarm/tools/bf-cat localhost:8080 "" SHA256 File 2b0c7b8839bcb322410449916521fab0fd21d1e51558ffa38fb60f2baaf83e47/18563764 | sha256sum
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
2b0c7b8839bcb322410449916521fab0fd21d1e51558ffa38fb60f2baaf83e47 -
from bazel-buildfarm.
Possibly, though all of the writes go through a checksum validation, so the only way to break through that is to change the file contents with an action - are you running with as-nobody or equivalent protection? Checking actual worker file contents will out
from bazel-buildfarm.
Running with just plain defaults of a docker desktop on windows w/ k8s & helm. In the sample above there is no overrides or configs outside the defaults.
Is there anything different about coverage runs? Particularly in regards to the need for --experimental_split_coverage_postprocessing --experimental_fetch_all_coverage_outputs
?
Is there anything about running in k8s/containers (i.e. something installed on the worker OS)? On prem we run enterprise enterprise linux however in the container it is just an Ubuntu image. So maybe some missing dependency on the worker OS or something. Just odd that this is specific to coverage builds and within k8s.
from bazel-buildfarm.
I'm not predisposed to suspect k8s - I run a k8s buildfarm myself with no obvious behavior like this for any other actions. It seems very specific to some action definition that is munging your inputs:
This has the potential to happen for any action which modifies an input file directly. The thing to check will be any operation which is executed during the build after an output_file/path of bazel-out/k8-fastbuild/testlogs/test-unit/_coverage/coverage-runtime_merged_instr.jar
that then uses it as an input (these things will have to be discovered through --remote_grpc_log, remote_client, and bf-cat). You'll need to remove the offending truncated file (after checking the contents below), restart that worker, and execute in sequence to observe this.
A question about the contents of the truncated file. First, have you located it on the filesystem of the worker and confirmed the size difference and violation. Second, do the contents make sense relative to the input? Are they the first 357 bytes, or maybe an empty jar file consistent with deleting the contents of the heavyweight?
from bazel-buildfarm.
This issue is resolved in Bazel 7.0.0, likely due to the switch to --remote_download_outputs toplevel as the default which no longer downloads the _runtime_merged_instr.jar files to the host machine.
I will close this for now as it likely is not a BF issue and Bazel specific.
from bazel-buildfarm.
Related Issues (20)
- Remote builds stuck on Buildfarm that is deployed with Helm. HOT 6
- Implement Fetch 'Push' Service
- Support http_header: prefix in Fetch Qualifiers
- bazel.canonical_id unsupported in Asset Fetch API HOT 1
- ERROR: error running 'git fetch origin refs/heads/*:refs/remotes/origin/* refs/tags/*:refs/tags/*' while working with @build_buildfarm~: HOT 2
- Updated from java17 to java21 to use the new server and worker causes JVM to shutdown
- Fetch service does not properly handle missing content-length HOT 1
- Support Multiple Hashing Function
- Remote execution service executing processes locally HOT 4
- ERROR: Failed to query remote execution capabilities: UNAVAILABLE: io exception HOT 4
- Feasibility Analysis of Using Buildfarm for Large-Scale Development HOT 2
- Verbose Logging for Servers and Workers through Helm Chart
- Check logs during remote execution HOT 4
- Incorrect container port for Shard-worker in Helm template
- hardlinks in CAS leads to task failure in some cases HOT 2
- Fetch asset support for credential use
- [Bzlmod] No repository visible as '@maven' from main repository HOT 1
- Querying remote cache failed due to Missing Digest HOT 5
- Helm chart won't deploy workers because {ready,live}ness probes are using the wrong port
- When is the release? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bazel-buildfarm.