Git Product home page Git Product logo

Comments (6)

martindurant avatar martindurant commented on July 17, 2024

Actually, the exception sounds to me as if the header went in fine, but the content was missing. Maybe this means, that the data (which is a in-memory file-like object for simple_upload) needs a seek(0) before retrying. I can see this happening if in the first call, all the data was sent, but the request failed to complete after this (as opposed to an error setting up the connection, which is probably more common).

from gcsfs.

martindurant avatar martindurant commented on July 17, 2024
--- a/gcsfs/core.py
+++ b/gcsfs/core.py
@@ -421,6 +421,8 @@ class GCSFileSystem(asyn.AsyncFileSystem):
         self, method, path, *args, headers=None, json=None, data=None, **kwargs
     ):
         await self._set_session()
+        if hasattr(data, "seek"):
+            data.seek(0)
         async with self.session.request(
             method=method,

from gcsfs.

Metamess avatar Metamess commented on July 17, 2024

Welp, now I feel stupid for misinterpreting the error message đŸ¤Ļ Thanks for the reply @martindurant , and the suggested code change!

I'm a bit at the edge of my knowledge and understanding here, so I apologize if this doesn't make sense, but I am left wondering: What would cause a request to fail if the data was all sent, and thus the request has effectively finished? Perhaps there is some other root cause that should be fixed to prevent this situation from occurring in the first place? (Sadly I don't have any debug logs from the gcsfs library itself when I encountered these errors, and I have been unable to reproduce them on demand ☚ī¸) Furthermore, could we be introducing a problem if we resend the data, if it was already fully received (and stored)?

Lastly, I also noticed that simple_upload wraps the data into an UncloseableBytesIO instance:

class UnclosableBytesIO(io.BytesIO):
    """Prevent closing BytesIO to avoid errors during retries."""

    def close(self):
        """Reset stream position for next retry."""
        self.seek(0)

Which seems to suggest to me that the seek(0) on data should already be called on the data when a retry occurs. So either the close() is not called while this is assumed to happen, or the seek(0) is not the solution for this issue (or, of course, I am missing something else here)

from gcsfs.

martindurant avatar martindurant commented on July 17, 2024

I cannot say why the situation occurs, but it doesn't surprise me that something can happen even after the data is sent, but before a success response comes back. Without the response, we can assume that the data isn't stored.

The unclosable thing was created for the case where the initial connection fails. The asyncio request function closes the input file-like anyway in this case, but we still want to read from it in the retry. seek(0) seems like a reasonable thing to do in any case. Having to pass a file-like in a the first place is strange (to me): apparently it makes for a more responsive event loop as asyncio can send the data in chunks.

from gcsfs.

Metamess avatar Metamess commented on July 17, 2024
--- a/gcsfs/core.py
+++ b/gcsfs/core.py
@@ -421,6 +421,8 @@ class GCSFileSystem(asyn.AsyncFileSystem):
         self, method, path, *args, headers=None, json=None, data=None, **kwargs
     ):
         await self._set_session()
+        if hasattr(data, "seek"):
+            data.seek(0)
         async with self.session.request(
             method=method,

If this is indeed all that's required to fix this issue, do you want to make a PR for this @martindurant , or would you prefer if I tried to do so?

from gcsfs.

martindurant avatar martindurant commented on July 17, 2024

Please do make a PR

from gcsfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.