Comments (6)
Actually, the exception sounds to me as if the header went in fine, but the content was missing. Maybe this means, that the data (which is a in-memory file-like object for simple_upload) needs a seek(0) before retrying. I can see this happening if in the first call, all the data was sent, but the request failed to complete after this (as opposed to an error setting up the connection, which is probably more common).
from gcsfs.
--- a/gcsfs/core.py
+++ b/gcsfs/core.py
@@ -421,6 +421,8 @@ class GCSFileSystem(asyn.AsyncFileSystem):
self, method, path, *args, headers=None, json=None, data=None, **kwargs
):
await self._set_session()
+ if hasattr(data, "seek"):
+ data.seek(0)
async with self.session.request(
method=method,
from gcsfs.
Welp, now I feel stupid for misinterpreting the error message đ¤Ļ Thanks for the reply @martindurant , and the suggested code change!
I'm a bit at the edge of my knowledge and understanding here, so I apologize if this doesn't make sense, but I am left wondering: What would cause a request to fail if the data was all sent, and thus the request has effectively finished? Perhaps there is some other root cause that should be fixed to prevent this situation from occurring in the first place? (Sadly I don't have any debug logs from the gcsfs library itself when I encountered these errors, and I have been unable to reproduce them on demand
Lastly, I also noticed that simple_upload
wraps the data into an UncloseableBytesIO
instance:
class UnclosableBytesIO(io.BytesIO):
"""Prevent closing BytesIO to avoid errors during retries."""
def close(self):
"""Reset stream position for next retry."""
self.seek(0)
Which seems to suggest to me that the seek(0)
on data should already be called on the data when a retry occurs. So either the close()
is not called while this is assumed to happen, or the seek(0)
is not the solution for this issue (or, of course, I am missing something else here)
from gcsfs.
I cannot say why the situation occurs, but it doesn't surprise me that something can happen even after the data is sent, but before a success response comes back. Without the response, we can assume that the data isn't stored.
The unclosable thing was created for the case where the initial connection fails. The asyncio request function closes the input file-like anyway in this case, but we still want to read from it in the retry. seek(0) seems like a reasonable thing to do in any case. Having to pass a file-like in a the first place is strange (to me): apparently it makes for a more responsive event loop as asyncio can send the data in chunks.
from gcsfs.
--- a/gcsfs/core.py +++ b/gcsfs/core.py @@ -421,6 +421,8 @@ class GCSFileSystem(asyn.AsyncFileSystem): self, method, path, *args, headers=None, json=None, data=None, **kwargs ): await self._set_session() + if hasattr(data, "seek"): + data.seek(0) async with self.session.request( method=method,
If this is indeed all that's required to fix this issue, do you want to make a PR for this @martindurant , or would you prefer if I tried to do so?
from gcsfs.
Please do make a PR
from gcsfs.
Related Issues (20)
- Strange behavior with `HTTPError` and multiprocessing HOT 3
- isdir/info method works incorrectly HOT 20
- Clarify how to pass JSON credentials HOT 2
- Is it needed to be so strict about dependency on fsspec HOT 1
- Release 2023.10.0 and consider relaxing fsspec dependency HOT 3
- Error when listing large directory with versions=True
- Request: add chmod
- Issues when using identity_pool.Credentials for connecting GCSFileSystem HOT 2
- Strange error message when using cp instead of put HOT 2
- Filename with slashes in the path are getting URL encoded, causing them to fail HOT 2
- Pin generation on open for version aware file system HOT 4
- asyncio exception while writing to zarr store HOT 5
- Add API reference for gcsfs.mapping to docs HOT 2
- `fs.isdir` latency 200x slower beginning with version 2023.09.01 HOT 1
- Error introduced in 2024.3.0 HOT 2
- FileNotFoundError since 2024.3.1 HOT 5
- Missing 'name' attribute in 'GCSFile' object when accessing PDF files HOT 1
- `unstrip_protocol` not implemented correctly HOT 4
- Question: aiohttp vs. gRPC API HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcsfs.