Git Product home page Git Product logo

Comments (5)

joeyh avatar joeyh commented on June 1, 2024

I am not sure what change this needs on the git-annex side. You can already run multiple git annex get commands in parallel.

from datalad.

yarikoptic avatar yarikoptic commented on June 1, 2024

sorry for not being clear... idea is not to request downloads in parallel independently (e.g. parallel runs of wget requesting different files) but rather e.g. to provide full list of files to be 'get'ed at once which would then serve nicely requests like 'get files X Y Z from archive blah.tar.gz'. This would differ from running 3 independent processes in parallel.
Also I have no clue (yet) on how such request should be specified since those files all might need specification of the destination path/filename, e.g. I do not see anything in tar cmdline to allow extraction of multiple files into arbitrary destination locations.

from datalad.

joeyh avatar joeyh commented on June 1, 2024

Ok, sounds more like caching resources so they can be reused for multiple transfers.

So, there's potentially some overlap with the resource management I recently added to git-annex to allow reusing of eg, http connections when downloading multiple chunks of a chunked key.

Expanding that to support reusing connections (or reusing a downloaded tarball in your example) when downloading multiple keys needs a solution to the question: How long should the cached resource be kept around? Certianly only until the end of the git annex get command, but ideally less time than that; if a lot of files are being transferred we want to be able to examine the set of transfers and reorder ones that can reuse the same resources etc. There's a tension here with wanting git annex get to still start the first transfer promptly as it does now, and not need to buffer a great many transfers in memory.

from datalad.

yarikoptic avatar yarikoptic commented on June 1, 2024

well:

  • git annex supports parallel downloads now with -J switch, so kinda "solved" on annex side (removing git-annex label)
  • there is an outstanding issue to debug/fix for requesting multiple files from the same archive (#451)
  • our install commands allows for multiple targets for installation ATM, and the rest of the logic on analysis of what should be the most efficient 'annex get' operations would be is TODO. See https://git-annex.branchable.com/todo/wishlist__58___--dry-run_option_for_all_commands/ and the particular command would be git annex find --not --in here -j [paths] which would return in json records also the keys in question

from datalad.

yarikoptic avatar yarikoptic commented on June 1, 2024

I think it was largely solved, not clear what else we should possibly do here, thus closing

from datalad.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.