Git Product home page Git Product logo

Comments (5)

martindurant avatar martindurant commented on September 18, 2024

There are an awful lot of GET calls in there that, for a whole-array-write operation are totally unnecessary. Even if the array already exists, a single read of all the metadata pieces should suffice. I wonder where we can provide a better directory listing caching experience around this. There is, separately, talk of using a transactional in-memory cache specifically for zarr metadata files (upload when finished) that would help a lot too. It is already possible to provide separate metadata and data storage backends in zarr.

I mention this because, while I don't know what the specific problem is, I can only but assume that the total number of requests/coroutines is implicated by something from deep within asyncio.

One lever you could pull on is the fsspec config setting conf["nofiles_gather_batch_size"] (default given by fsspec.asyn._NOFILES_DEFAULT_BATCH_SIZE=1280) to a smaller value.

If there are really requests being made with zero data, we should be able to find out where that's happening and continue on. Perhaps there is a race condition where all the data of a chunk is sent successfully, but the sending function subsequently errors. This would be in gcsfs.core.simple_upload.

from gcsfs.

martindurant avatar martindurant commented on September 18, 2024

every batch of 10 chunks

Is this the number of zarr chunks in a dask partition, or where else does this number come from?

from gcsfs.

bweasels avatar bweasels commented on September 18, 2024

Thanks for the fast reply! This number is the number of images you want to hold in memory before writing them to the bucket, so user defined really.

WRT the large number of GET for every batch upload - while trying to debug the race condition, I tried reconnecting to the zarr store on the bucket every time I wrote a set of 10 images to try to see if the error was related to the connection going stale (idk - I'm a scientist, not a networking guy). Removing that re-connection call removes the stack of GET calls.

Thanks for the pointer on gcsfs.core.simple_upload - I'll see if I can explore it to do some debugging for my weird case. If it'll help, I'll try to make a minimal reproducible example this weekend.

from gcsfs.

bweasels avatar bweasels commented on September 18, 2024

I was able to manually trace it back to _request on line 412 in gcsfs.core. It seems like the:
async with self.session.request( ... ) as r:
command on line 416 may be where it fails prior to going into the race condition. The data object going into that command prior to failure is not empty, (<gcsfs.core.UnclosableBytesIO object at 0x0000021B6ACB7BF0> with a non-zero size from getvalue), so I'm guessing its something in self.session.request? That said, it seems like self.session.request comes from another package (aiohttp?), so I ran out of steam and stopped pursuing it. Given that this is the first you're seeing of this, this may be specific to my situation, but maybe this issue thread can help if someone else has this issue. I chatted with the lab and we'll pursue a different, slower uploading schema to get around this. Thanks again for your help!

from gcsfs.

martindurant avatar martindurant commented on September 18, 2024

I hope you are right, but good to provide this information for others anyway

from gcsfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.