For larger zip files it can be prohibitive to provide the entire deor in one sho

Sorry, my original deion is sorely lacking. I have an API (not

Consider supporting streaming zip file descriptors about zipstreamer HOT 6 CLOSED

danlamanna commented on August 18, 2024

Consider supporting streaming zip file descriptors

from zipstreamer.

Comments (6)

scosman commented on August 18, 2024 1

Got it. I think you can still do this as is.

Change your descriptor serving code to serve valid JSON in a streaming fashion. Some JSON libraries might not handle it, others do, and you can always hand code given how simple it is. That way zipstreamer reads bytes as they are ready, and their heroku connection is kept alive. Go might not process it until the end, but it will read it.

Pseudocode:

output.Write '{"suggestedFilename": "tps_reports.zip","files": ['
for int i; i < 10000;i++ {
  if i != 0 {
      output.Write ','
  }
  output.Write '{"url":"https://server.com/image${i}.jpg","zipPath":"image${i}.jpg"}'
}
output.Write ']}'

from zipstreamer.

scosman commented on August 18, 2024

How big of a descriptor are you generating? I imagine this would need to be a massive number of files to get anywhere near the 30s Heroku limit.

Also: correct me if I'm wrong, but I thought the Heroku 30s reset when we send a byte? Once generated, we stream as fast as the client can handle. So it would only trigger a timeout if the descriptor generation phase is > 30s, which doesn't seem possible. Golang is pretty fast and could generate even massive descriptors sub-second. If anything, there's a small memory hit here if a client very slowly downloads a huge descriptor.

Can you provide some more details of what you're doing and the issue encountered. I understand the ask, but I don't quite follow how this could be causing issues.

from zipstreamer.

scosman commented on August 18, 2024

Closing for now, since I don't know how to reproduce a non-streaming descriptor (minus the period we're generating, but that's a few milliseconds). If I'm missing something about the issue please let me know some more details and can re-open!

from zipstreamer.

danlamanna commented on August 18, 2024

Sorry, my original description is sorely lacking.

I have an API (not written in Go) on Heroku that is serving the zip descriptor. This does take a while to generate - it searches records, filters permissions, signs urls, and serializes the entire structure to JSON - ultimately returning a descriptor with ~10k-200k elements. The 30 second timeout is prohibitive in our case because of the JSON serialization step, we can't serve a single byte until we have the entire data structure serialized.

I think the ideal scenario would be a streaming descriptor format like jsonlines or something, where zipstreamer could start fetching/serving bytes before it's received the full descriptor. But for now I've hacked a fork to use a paginated descriptor to support iterating through the files in chunks of 1,000 and it works decently well.

Does that make more sense?

from zipstreamer.

danlamanna commented on August 18, 2024

This does work, thanks!

Do you have any opinions on a non-atomic descriptor format in general? I see 2 primary benefits:

Performance: the user and zipstreamer don't idly wait for the API server to generate the entire descriptor, resulting in faster downloads
UX: the user sees bytes downloading even before the API and zipstreamer finish their complete dialog, obviating the "is my download working" problem

from zipstreamer.

scosman commented on August 18, 2024

Performance wise: idle waiting isn't really a perf concern. There's a bit of memory usage, but the same case can happen with a slow client. The fix there would be to use the disk for descriptors, not stream the them from input.

UX: I'm not sure it improves it much. Since we aren't setting the size of the downloaded file when we start (since we don't know it), browsers just show a spinner from the start, not progress a progress bar, and this has the same UX. It would be a bit faster.

Biggest concern: it would be a lot of complexity. I'd have to deal with errors mid stream, streaming JSON parsing, and keeping 3 buffers in sync (descriptor, downloading files, streaming out zips). It could be done but it would be a lot of work. I think the real fix is a faster descriptor source (cache, faster generation, etc).

from zipstreamer.

Consider supporting streaming zip file descriptors about zipstreamer HOT 6 CLOSED

Comments (6)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent