Git Product home page Git Product logo

Comments (6)

scosman avatar scosman commented on August 18, 2024 1

Got it. I think you can still do this as is.

Change your descriptor serving code to serve valid JSON in a streaming fashion. Some JSON libraries might not handle it, others do, and you can always hand code given how simple it is. That way zipstreamer reads bytes as they are ready, and their heroku connection is kept alive. Go might not process it until the end, but it will read it.

Pseudocode:

output.Write '{"suggestedFilename": "tps_reports.zip","files": ['
for int i; i < 10000;i++ {
  if i != 0 {
      output.Write ','
  }
  output.Write '{"url":"https://server.com/image${i}.jpg","zipPath":"image${i}.jpg"}'
}
output.Write ']}'

from zipstreamer.

scosman avatar scosman commented on August 18, 2024

How big of a descriptor are you generating? I imagine this would need to be a massive number of files to get anywhere near the 30s Heroku limit.

Also: correct me if I'm wrong, but I thought the Heroku 30s reset when we send a byte? Once generated, we stream as fast as the client can handle. So it would only trigger a timeout if the descriptor generation phase is > 30s, which doesn't seem possible. Golang is pretty fast and could generate even massive descriptors sub-second. If anything, there's a small memory hit here if a client very slowly downloads a huge descriptor.

Can you provide some more details of what you're doing and the issue encountered. I understand the ask, but I don't quite follow how this could be causing issues.

from zipstreamer.

scosman avatar scosman commented on August 18, 2024

Closing for now, since I don't know how to reproduce a non-streaming descriptor (minus the period we're generating, but that's a few milliseconds). If I'm missing something about the issue please let me know some more details and can re-open!

from zipstreamer.

danlamanna avatar danlamanna commented on August 18, 2024

Sorry, my original description is sorely lacking.

I have an API (not written in Go) on Heroku that is serving the zip descriptor. This does take a while to generate - it searches records, filters permissions, signs urls, and serializes the entire structure to JSON - ultimately returning a descriptor with ~10k-200k elements. The 30 second timeout is prohibitive in our case because of the JSON serialization step, we can't serve a single byte until we have the entire data structure serialized.

I think the ideal scenario would be a streaming descriptor format like jsonlines or something, where zipstreamer could start fetching/serving bytes before it's received the full descriptor. But for now I've hacked a fork to use a paginated descriptor to support iterating through the files in chunks of 1,000 and it works decently well.

Does that make more sense?

from zipstreamer.

danlamanna avatar danlamanna commented on August 18, 2024

This does work, thanks!

Do you have any opinions on a non-atomic descriptor format in general? I see 2 primary benefits:

  1. Performance: the user and zipstreamer don't idly wait for the API server to generate the entire descriptor, resulting in faster downloads
  2. UX: the user sees bytes downloading even before the API and zipstreamer finish their complete dialog, obviating the "is my download working" problem

from zipstreamer.

scosman avatar scosman commented on August 18, 2024

Performance wise: idle waiting isn't really a perf concern. There's a bit of memory usage, but the same case can happen with a slow client. The fix there would be to use the disk for descriptors, not stream the them from input.

UX: I'm not sure it improves it much. Since we aren't setting the size of the downloaded file when we start (since we don't know it), browsers just show a spinner from the start, not progress a progress bar, and this has the same UX. It would be a bit faster.

Biggest concern: it would be a lot of complexity. I'd have to deal with errors mid stream, streaming JSON parsing, and keeping 3 buffers in sync (descriptor, downloading files, streaming out zips). It could be done but it would be a lot of work. I think the real fix is a faster descriptor source (cache, faster generation, etc).

from zipstreamer.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.