Comments (6)
- Android configuration hash should be relocatable
- Influential environment variables should invalidate cache (see autoconf's
./configure --help
for common ones)
from needy.
We definitely want the directory cache to be cleared aggressively if disk space is low.
from needy.
Before we expand this any further, I want to revise what we currently have. If possible I want to simplify things a lot to make things more maintainable.
The requirement of storing the policies and manifest in the cache itself complicates things. If we were to remove those concepts from BuildCache
, we would have far fewer consistency concerns (e.g. we wouldn't have to worry about the lock-read-write-unlock pattern that's completely infeasible on S3).
Both of those things are currently only used for garbage collection purposes. So can we do garbage collection some other way?
For filesystems, I think it's fair to limit filesystem cache support to filesystems that record access times. So no need for the manifest at all there. But we still need to intelligently decide when to actually perform garbage collection. So maybe we keep a file there that contains some metadata such as last garbage collection time, but keep it internal and invisible to BuildCache
.
For S3, we can't trivially get access times. But it's trivial to set lifecycle policies instead, so I think it's fine if we don't worry about garbage collection there.
So I think the Cache
interface should be reduced to simply get
, set
, and maintain
. The maintain
function performing garbage collection, or in the case of S3, either doing nothing or informing the user that they should set lifecycle policies.
from needy.
I also need to make this caching much more transparent. I want it to be something that everyone can benefit from. Not just a massive chunk of code with narrow application that when finished only benefits those that seek it out.
If we can make the above changes, I'd like to utilize part of this for the feature mentioned in-person: The addition of a configuration option in the needs file that specifies a team or organization-wide cache.
That would be a good start towards getting some of the caching code out of the experimental phase and shaped into something more presentable.
from needy.
The position I was attempting to express in person is that I want there to be a 1:1 relationship between the needs definition of a library and fully-qualified cache keys. If I make a modification to a library definition, that change must be reflected in the thing that needy produces--cached or not.
With that in mind, I think setting cache keys per library is a great idea, but I would want them to act as some sort of namespace to the fully-qualified key. Example:
args:
download: https://github.com/Taywee/args/archive/6.0.2.tar.gz
checksum: 4be736f11aa2008820d0836bac4595d889048242
cache-key: s3://my-teams-ci-bucket/Taywee/args
project:
build-steps: cp args.hxx {build_directory}/include/
If Needy took that yaml definition, removed the cache-key from it, hashed it up, added whatever Needy qualifiers were required to accurately represent the build parameters, and then searched for a cache object at s3://my-teams-ci-bucket/Taywee/args/macos/x86_64/$sha256/distribution.tar.gz
, then it would perfectly satisfy the 1:1 relationship requirement.
While I would like Needy to provide a means to assist in uploading (or at least aggregating) newly built objects, I don't think it's a strict requirement. I don't think it's going to be possible to satisfy the 1:1 requirement without Needy at least producing keys for resulting objects. Without at least the aggregation functionality, I have a feeling that users that want to integrate this into CI are going to have a lot of duplicate deployment code in each applicable repo and that's something I want to avoid.
I really dislike the manifest and policy and the file locking dances that happen around them. I think my original implementation lacked a lot of that, but requirements at the time dictated its necessity.
The cache should be treated as a key-value store and Needy should treat the inability to access a key as if it doesn't exist. I believe we currently do this so very little, if anything, would be lost if the manifest were removed altogether.
To go one step further, I think existing cache objects should be immutable. If those properties apply (and if key paths make sense), there's no need for a manifest at all nor is there any need for file locking. If a successful open
call occurs and then another Needy process deletes it during garbage collection or because the user rm'ed the cache directory, there still won't be a problem during the current read.
To guarantee that new object creation occurs atomically on a local filesystem, we can just create a temporary staging directory next to the cache, write the object to that with a random name, and then move the file into place. Nothing has to be done to ensure this takes occurs on S3--all GETS and PUTS are atomic (albeit not transactional). At most, a race condition could occur between an object at a key being deleted, a new object for that key being uploaded, and the object at that key being downloaded. All of those operations would succeed atomically and the entire object retrieved each time, but there would be no way to determine if the downloaded object was the object before or after the delete. This isn't a problem as long as keys represent their objects exactly.
As for the cache policy, I think a policy is important, but this could be done at the repo-level in .needyconfig
.
TLDR: remove all of the file locking, eliminate the cache policy and manifest, and add some logic to guarantee that cache objects are never modified once stored.
from needy.
#144 simplifies things a lot without compromising features. Now I'm much more comfortable expanding on this.
from needy.
Related Issues (20)
- Cache universal binary builds HOT 1
- Evaluate configure-steps and build-steps in single shell context HOT 1
- Add better caching for directory sources HOT 3
- Add caching of sources HOT 3
- Graceful handling of git errors HOT 2
- Warn/error when pkg-config isn't available HOT 1
- Improving error feedback HOT 2
- Add needy clean HOT 1
- Support for "watching" directories or files
- generator documentation
- Source linking doesn't work on Mac HOT 1
- apple tests
- Race condition in pkgconfig.jam HOT 5
- Autodetect travis builds and emit section markers HOT 2
- Dependencies shouldn't be built when present in pkg-config HOT 1
- Android functional tests HOT 1
- Add build layers
- Add submodule sources HOT 1
- Fix caching when using multiple filesystems
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from needy.