Comments (7)
HTTPFileSystem might now return a HTTPStreamFile where previously it returned a raw file-like requests response object. I don't think this changes anything from dask's point of view, except that we don't even try the "lets see if this is smaller than a block" approach. A retry would have to be for the whole of the request, not each call to read. However, a retry on establishing the connection (here) would make sense.
from dask-examples.
I'm not sure there's much we can do about broken connections, I can't see that it could be any fault of ours; retries could be built into the HTTPFileSystem, but perhaps it's better to retry the whole tasks in such cases.
from dask-examples.
Is there a good reason to avoid retries in HTTPFileSystem
?
from dask-examples.
No, but a couple of things that make it tricky:
- it is tricky to consider which set of errors should lead to a retry. Perhaps would have to retry everything
- some things, like establishing the initial connection, are already retried by requests/urllib
- if it's a timeout, then a set of retries might take a very long time to fail
- in the fsspec implementation, there is a non-seekable fallback mode when the file-size is unavailable, that gives you a requests file-like object rather than a HTTPFile. I don't think we can easily intercept its read methods for the purposes of catching errors.
from dask-examples.
This SO answer might be the best way to do it globally: https://stackoverflow.com/a/15431343/3821154 , allows you to be explicit about retries following a connection error that should apply to all connections within a session
from dask-examples.
Quite some refactoring of fsspec's HTTP implementation lately.
Are dask
tests still flaky?
AFAICS, fsspec
now returns an HTTPFile
even if range requests are not possible. Does that mean a retry policy in fsspec makes more sense now @martindurant?
from dask-examples.
(feel free to implement that in a PR, in case you have the time)
from dask-examples.
Related Issues (20)
- Can not use "conda install" in binder environment HOT 12
- Placement of interactive dashboard in JupyterLab HOT 2
- Move default branch from "master" -> "main"
- json-data-on-the-web CI check timing out HOT 2
- Binder badge links give 404 errors from survey result notebooks HOT 4
- Remove binder banner in favour of theme banner
- Dashboard needs to be set up every time HOT 1
- Update dependencies and ensure all notebooks are working HOT 5
- Running the Bag example several times consecutively results in a `JSONDecodeError`
- Automatically clear notebook output HOT 1
- XGBoost example notebook uses deprecated dask-xgboost HOT 2
- Website missing `dask-sphinx-theme` font HOT 2
- Create an Example of Using TPOT Using Dataset that DOESN'T Fit in Memory HOT 1
- Large scale XGBoost example with HyperParameter Optimization HOT 21
- Binder build fails with conda conflict
- Attribute error in imshow of an image processing result HOT 2
- ML notebook points to ML tutorial lesson that was removed
- XGboost example outdated and broken doc links
- Calling len(ddf) within the 01_datraframe.ipynb tutorial fails due to mismatched dtypes.
- Dask slides hyperlink led to "Page not found" error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-examples.