Git Product home page Git Product logo

Comments (12)

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @tristanvb] on Feb 16, 2017, 07:57

Agreed, this is a bit of a corner case and needs a bit of thought.

Currently what will happen is that if two git sources are fetched simultaneously and dont have a cache yet (i.e. initial clone is needed), then they will download into separate temporary directories and the first one to complete will "win".

This should not be fixed with flock(), it's just not worth the possibility of dealing with stale locks to cater to this corner case.

Probably one of the following needs to happen:

  • BuildStream needs to recognize that two sources are in fact the same, and delegate appropriately (or rather, allow those sources to find their relatives somehow)
  • I think this is doubtful but seems somewhat plausible
  • Source plugins should share state in memory to handle this possibility
  • Possibly BuildStream's base Source class can help with this by providing access to ongoing source operations to a plugin, only for plugins of the same type (so when GitSource->fetch() is called, it can see other git sources which are currently fetching)
  • This will not help the case of racing, separate BuildStream instances which share the same sources cache (multiple simultaneous invocations of BuildStream), but is still preferable over trying to use filesystem state.

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @tristanvb] on Feb 16, 2017, 08:06

Actually scratch the above, another simpler and better solution now comes to mind.

Instead of cloning into a temp directory, we can reduce the race period significantly so that the initial clone would just be:

  • Create the tmp directory (for the racy part)
  • In the tmp directory, call git init --bare
  • In the tmp directory, call git remote add origin <url>
  • Still in the tmp directory, ensure that the git repo is configured to be a bare mirror
  • Now move the (empty) git mirror into place in the git source's cache directory (racy part is finish)
  • Do the regular git fetch now

What I expect this to achieve is that no time or bandwidth is lost, the second process calling git fetch on the same repo will rely on git's own locking mechanisms, rendering the second fetch a noop.

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @palvarez89] on Feb 16, 2017, 09:16

I'd need to try this second option (relying on git fetch) to see how it behaves. This one, though, relies on the filesystem for the state, but I think we can trust git on this.

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @palvarez89] on Apr 8, 2017, 12:50

I've quickly investigated the "atomic git init" and then multiple git fetch, and it didn't work as you thought it would. The second git fetch will do exactly the same thing as the already-running one, and it won't save any download time.

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @tristanvb] on Apr 11, 2017, 09:16

That's too bad, I did run into this today and it was indeed a bit painful, but this really is a corner case too.

  • It must be relatively rare that more than one element requires the same source
  • Once you have downloaded it once, it should stay cached and pretty much never be fetched twice again

That said, it would still be interesting to handle this better, but I dont think this should be an area of focus for the time being :)

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @franred] on Jun 26, 2018, 16:45

mentioned in commit baserock/definitions[Gitlab user @8e8a7a29dd0d5efd4e8f23f03ce5377e09c75942]

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @franred] on Jun 27, 2018, 13:28

mentioned in commit baserock/definitions[Gitlab user @3e4fe53804019f3918b28a20e10f447e05ff3189]

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @franred] on Jun 28, 2018, 12:50

mentioned in commit baserock/definitions[Gitlab user @8450b79c705d25af217702164ef11fd6c5974b3c]

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @richardmaw-codethink] on Aug 9, 2018, 17:10

This can also be triggered by fetches that include submodules of the same repository.

This is common with Google projects including googletest, and GNU projects including gnulib, so it can happen more likely than you might think.

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @gokcennurlu] on Jul 10, 2019, 16:59

I think potential race conditions for Source.fetch() still exists. Description in the ticket focused on git plugin and mentioned the gain in time spent, but this is also very important for plugins that doesn't necessarily have builtin locking mechanisms. It also looks like buildstream can't provide utils for such locking until we make separately started bst processes to have a consensus mechanism. Maybe this ticket is now a wontfix?

If this is correct, should we close it and document this corner case (limitations of source plugins / guarantees provided by buildstream) together with:

  • It must be relatively rare that more than one element requires the same source
  • Once you have downloaded it once, it should stay cached and pretty much never be fetched twice again

?

If agreed, I suggest https://buildstream.gitlab.io/buildstream/buildstream.source.html#buildstream.source.Source.fetch to add the documentation.

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @tristanvb] on Jul 10, 2019, 20:59

It also looks like buildstream can't provide utils for such locking until we make separately started bst processes to have a consensus mechanism.

It is currently the source plugin's responsibility to either rely on host tooling which supports parallelism or to use atomic operations and temporary subdirectories to ensure there are no clashes.

I think that this should definitely be documented in Source.fetch() documentation yes, I think we may have some utilities for atomic file writing (and maybe directory swapping ?) which such docs could link to also ?

If this is correct, should we close it and document this corner case...

I'd much prefer to keep it open, even if this bug never gets fixed, it is of value to keep open in the issue database (a bug being open should not be perceived as a negative thing).

I would be happy to tag it as "minor" or such to reflect a low severity, but I rather favor keeping as many issues open as possible, to have a richer bug database of improvements which can potentially be made, no matter how minor they are.

Separately from the above, I would also note that while this appears to be a corner case, it is a particularly annoying one because:

  • It is a typical use case of BuildStream to be able to build a whole system from scratch, which we do use it for (and one of the first requirements of the project - if it can build a system from scratch then it can basically build anything in between)
  • The linux kernel repository is pretty much the most painful one to clone
  • During the bootstrap process, it is necessary to use the kernel header files, and typical to also build a kernel once the bootstrapping is complete

What this adds up to, is that when you start building a minimal full OS from scratch with no sources cached the first time around, you will usually run into a dual fetch of the linux kernel.

This would be a lot less painful of course with #261 resolved (use of shallow git clones).

from bst-staging.

Cynical-Optimist avatar Cynical-Optimist commented on September 14, 2024

In GitLab by [Gitlab user @nanonyme] on Oct 7, 2020, 19:18

Regarding this being corner-case: this is actually so severe issue that freedesktop-sdk ended up mirroring basically everything to avoid getting throttled by its sources. Note, this is just about bst1. With bst 2 remote source caching this is indeed a corner case.

from bst-staging.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.