Git Product home page Git Product logo

Comments (6)

tbpg avatar tbpg commented on July 28, 2024

This seems reasonable to me (another endpoint to support). But, I'm not sure I understand this part:

hold the buffer with all bytes in memory.

An io.Reader doesn't need to be stored in memory, right? Or is this an issue under the hood with how the request is handled?

from go-tika.

evanfuller avatar evanfuller commented on July 28, 2024

@tbpg I'm basing my intuition off of this medium post.

The gist would be that if we want multipart support, we need some in-memory buffer to pass to multipart.NewWriter() when building requests, but in the (first) experiment the author conducted, Go just naively built a buffer in-memory containing the entire file. Later in the post, the author makes an improvement by using an io.Pipe instead so that the entire file is not held in memory just for the request.

Admittedly, I did not attempt to replicate the findings of this post, so it could be possible that Go has improved buffering for large requests like this, but I'm not sure.

from go-tika.

tbpg avatar tbpg commented on July 28, 2024

Gotcha. That seems reasonable to me. I think we'd have to play with what the exact API is for the tika package. In general, I'd like to leave as much of the creation of the io.Reader to the caller. But, it might be too cumbersome to expect someone to use the multipart package?

from go-tika.

evanfuller avatar evanfuller commented on July 28, 2024

Yeah, point definitely taken on trying to be implementation/caller-agnostic. That said, the docs for this particular endpoint suggest that it is specifically for use with multipart uploads.

from go-tika.

k7en avatar k7en commented on July 28, 2024

Hello
I'm trying to extract meta-information and body text of a large file and it's consuming a lot of memory and I came here in search of a solution. I would like to communicate to Tika's API using the Multpart method.
Do you know what the situation is now with this Issue?
If you know how to solve this problem, I'd appreciate some advice.
BRGDS.

from go-tika.

tbpg avatar tbpg commented on July 28, 2024

I'm very open to a PR here. Warning, we might need to modify the interface a little bit, keeping it as minimal as possible while enabling you to do what you need to do.

from go-tika.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.