Comments (2)
@karimmamer @michaeltrs I believe you both were experiencing this issue. I'll add a comment with the proposed solution below. Let me know if you have any additional detail on the problem or feedback on the solution.
from radiant-mlhub.
Proposed solution is 2-fold:
-
Allow users to resume downloads instead of always starting from scratch
We could change the signature of the
client.download_archive
function to the following:def download_archive( archive_id: str, output_dir: Path = None, *, if_exists: Literal['skip', 'overwrite', 'resume'] = 'resume', **session_kwargs ) -> Path:
The default behavior would be to resume the download, but users would have the option to overwrite the existing file or skip it if it exists. If resuming a download, the client would check the size of the existing file against the
Content-Length
heading for the download. If they are the same, then it would skip the download. If the file size is less than theContent-Length
of the download then the client would make a Range request for just the remaining bytes and append them to the existing file. If the existing file size is greater than theContent-Length
it would raise an exception (this shouldn't happen unless the remote archive changed). -
Automatically retry failed requests from broken connections
The client would automatically retry any requests that fail due to connection errors up to a specified maximum number of retries. If a retry is successful, then the attempts would reset to 0. This would ensure that larger files wouldn't ultimately fail just because they have a better chance of having more failures over the life of the entire download.
I'm thinking we will implement this at the
requests.Session
level so we can take advantage of it for all requests.
from radiant-mlhub.
Related Issues (20)
- Failure to download catalogs or data for datasets dlr_fusion_competition_germany and ref_fusion_competition_south_africa HOT 4
- Enable downloading assets within a pytest environment HOT 5
- Bug caused by PySTAC upgrade. bad version pinning in our setup.py
- 404 not found for registry url from recent mlhub version HOT 2
- improve projection metadata for sen12floods and other stacs so they can be used more easily with stackstac HOT 4
- Data download Error with collection_filter option HOT 2
- LandCoverNet download includes unnecessary metadata when using collection_filter HOT 1
- Method for getting archive size HOT 1
- Add keywords and long descriptions to Datasets/Collections
- Include LICENSE file in package distribution
- Additional dataset attributes break Dataset.collections HOT 1
- Human-readable Collection info HOT 3
- When using api_key parameter, some class properties cannot be accessed HOT 1
- Move to One Flow branching strategy instead of Git Flow HOT 3
- Using `api_key` in `Dataset.download(...)` raises an exception HOT 4
- Drop support for Python 3.6 HOT 1
- Dataset organisation of SpaceNet 2: Vegas collection HOT 1
- Nothing is happening when I start a download HOT 2
- Continuous 104 exception trying to download the data for "dlr_fusion_competition_germany" HOT 3
- SpaceNet missing collections HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from radiant-mlhub.