Comments (6)
This seems reasonable to me (another endpoint to support). But, I'm not sure I understand this part:
hold the buffer with all bytes in memory.
An io.Reader
doesn't need to be stored in memory, right? Or is this an issue under the hood with how the request is handled?
from go-tika.
@tbpg I'm basing my intuition off of this medium post.
The gist would be that if we want multipart support, we need some in-memory buffer to pass to multipart.NewWriter()
when building requests, but in the (first) experiment the author conducted, Go just naively built a buffer in-memory containing the entire file. Later in the post, the author makes an improvement by using an io.Pipe
instead so that the entire file is not held in memory just for the request.
Admittedly, I did not attempt to replicate the findings of this post, so it could be possible that Go has improved buffering for large requests like this, but I'm not sure.
from go-tika.
Gotcha. That seems reasonable to me. I think we'd have to play with what the exact API is for the tika
package. In general, I'd like to leave as much of the creation of the io.Reader
to the caller. But, it might be too cumbersome to expect someone to use the multipart
package?
from go-tika.
Yeah, point definitely taken on trying to be implementation/caller-agnostic. That said, the docs for this particular endpoint suggest that it is specifically for use with multipart uploads.
from go-tika.
Hello
I'm trying to extract meta-information and body text of a large file and it's consuming a lot of memory and I came here in search of a solution. I would like to communicate to Tika's API using the Multpart method.
Do you know what the situation is now with this Issue?
If you know how to solve this problem, I'd appreciate some advice.
BRGDS.
from go-tika.
I'm very open to a PR here. Warning, we might need to modify the interface a little bit, keeping it as minimal as possible while enabling you to do what you need to do.
from go-tika.
Related Issues (14)
- Updating go docs HOT 3
- Making location of Tika tmp configurable. HOT 3
- Add function to allow setting java props
- Missing jar file fails silently
- Expose Tika http status code in errors returned by client methods HOT 3
- how to convert to html use go-tika? HOT 2
- Return an error if the JAR file doesn't exist HOT 1
- Not able to call different methods on the client for the same *os.File HOT 5
- Client reads every response in memory HOT 3
- Pass a request or request header to Parse HOT 5
- If the server already running HOT 1
- tika: add latest server versions HOT 7
- tika: create integration tests for new server versions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from go-tika.