Git Product home page Git Product logo

Comments (6)

vmrob avatar vmrob commented on August 16, 2024
  • Android configuration hash should be relocatable
  • Influential environment variables should invalidate cache (see autoconf's ./configure --help for common ones)

from needy.

ccbrown avatar ccbrown commented on August 16, 2024

We definitely want the directory cache to be cleared aggressively if disk space is low.

from needy.

ccbrown avatar ccbrown commented on August 16, 2024

Before we expand this any further, I want to revise what we currently have. If possible I want to simplify things a lot to make things more maintainable.

The requirement of storing the policies and manifest in the cache itself complicates things. If we were to remove those concepts from BuildCache, we would have far fewer consistency concerns (e.g. we wouldn't have to worry about the lock-read-write-unlock pattern that's completely infeasible on S3).

Both of those things are currently only used for garbage collection purposes. So can we do garbage collection some other way?

For filesystems, I think it's fair to limit filesystem cache support to filesystems that record access times. So no need for the manifest at all there. But we still need to intelligently decide when to actually perform garbage collection. So maybe we keep a file there that contains some metadata such as last garbage collection time, but keep it internal and invisible to BuildCache.

For S3, we can't trivially get access times. But it's trivial to set lifecycle policies instead, so I think it's fine if we don't worry about garbage collection there.

So I think the Cache interface should be reduced to simply get, set, and maintain. The maintain function performing garbage collection, or in the case of S3, either doing nothing or informing the user that they should set lifecycle policies.

from needy.

ccbrown avatar ccbrown commented on August 16, 2024

I also need to make this caching much more transparent. I want it to be something that everyone can benefit from. Not just a massive chunk of code with narrow application that when finished only benefits those that seek it out.

If we can make the above changes, I'd like to utilize part of this for the feature mentioned in-person: The addition of a configuration option in the needs file that specifies a team or organization-wide cache.

That would be a good start towards getting some of the caching code out of the experimental phase and shaped into something more presentable.

from needy.

vmrob avatar vmrob commented on August 16, 2024

The position I was attempting to express in person is that I want there to be a 1:1 relationship between the needs definition of a library and fully-qualified cache keys. If I make a modification to a library definition, that change must be reflected in the thing that needy produces--cached or not.

With that in mind, I think setting cache keys per library is a great idea, but I would want them to act as some sort of namespace to the fully-qualified key. Example:

args:
    download: https://github.com/Taywee/args/archive/6.0.2.tar.gz
    checksum: 4be736f11aa2008820d0836bac4595d889048242
    cache-key: s3://my-teams-ci-bucket/Taywee/args
    project:
        build-steps: cp args.hxx {build_directory}/include/

If Needy took that yaml definition, removed the cache-key from it, hashed it up, added whatever Needy qualifiers were required to accurately represent the build parameters, and then searched for a cache object at s3://my-teams-ci-bucket/Taywee/args/macos/x86_64/$sha256/distribution.tar.gz, then it would perfectly satisfy the 1:1 relationship requirement.

While I would like Needy to provide a means to assist in uploading (or at least aggregating) newly built objects, I don't think it's a strict requirement. I don't think it's going to be possible to satisfy the 1:1 requirement without Needy at least producing keys for resulting objects. Without at least the aggregation functionality, I have a feeling that users that want to integrate this into CI are going to have a lot of duplicate deployment code in each applicable repo and that's something I want to avoid.


I really dislike the manifest and policy and the file locking dances that happen around them. I think my original implementation lacked a lot of that, but requirements at the time dictated its necessity.

The cache should be treated as a key-value store and Needy should treat the inability to access a key as if it doesn't exist. I believe we currently do this so very little, if anything, would be lost if the manifest were removed altogether.

To go one step further, I think existing cache objects should be immutable. If those properties apply (and if key paths make sense), there's no need for a manifest at all nor is there any need for file locking. If a successful open call occurs and then another Needy process deletes it during garbage collection or because the user rm'ed the cache directory, there still won't be a problem during the current read.

To guarantee that new object creation occurs atomically on a local filesystem, we can just create a temporary staging directory next to the cache, write the object to that with a random name, and then move the file into place. Nothing has to be done to ensure this takes occurs on S3--all GETS and PUTS are atomic (albeit not transactional). At most, a race condition could occur between an object at a key being deleted, a new object for that key being uploaded, and the object at that key being downloaded. All of those operations would succeed atomically and the entire object retrieved each time, but there would be no way to determine if the downloaded object was the object before or after the delete. This isn't a problem as long as keys represent their objects exactly.

As for the cache policy, I think a policy is important, but this could be done at the repo-level in .needyconfig.


TLDR: remove all of the file locking, eliminate the cache policy and manifest, and add some logic to guarantee that cache objects are never modified once stored.

from needy.

ccbrown avatar ccbrown commented on August 16, 2024

#144 simplifies things a lot without compromising features. Now I'm much more comfortable expanding on this.

from needy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.