Comments (3)
First of all - kudos for this library! It proves to be very useful to our project in Magnet.
Thank you, I'm glad that you're finding it useful.
However we need an export to image functionality that Apache's PDFbox provides.
OK. Is this functionality already present in any of the Java examples here:
https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/
I'm asking because I'm trying to understand what exactly are you trying to do: extract images out of a PDF or ...?
We fought that it would be nice if your library has it as well.
We'd be happy to make a PR with this.
OK, but let me understand what you're trying to do first. Then if you're willing to do the work, then that would be great.
from pdfboxing.
We have pdfs (possibly multi-paged) that we need thumbnails for. In our case, each page gets converted into an image. Something like with Google Drive - they don't display a pdf in the preview. Just an image with its thumbnail.
from pdfboxing.
We have a use case where we want to extract all images from the entire document so we can then do ML on each image. Extracting the text is done separately. PDFBox looks like the right tool for it:
https://docs.aspose.com/pdf/java/extract-images-from-pdf-file/
Similar use case with the nodeJS pdf-lib
(the extract-images.zip
example which seems to work well):
Hopding/pdf-lib#83 (comment)
from pdfboxing.
Related Issues (19)
- Typo in README HOT 4
- Please deploy 0.1.6 to Clojars HOT 1
- Can't load PDF URLs HOT 3
- warning on PDF merge HOT 3
- Working with split pdfs HOT 14
- Images HOT 2
- IOException COSStream has been closed and cannot be read HOT 11
- ClassNotFoundException when using Java 9 HOT 11
- Issues with `merge-pddocuments` HOT 17
- How to embed a signature of an image of a signature? HOT 3
- Update to PDFBox 2.0.12 due to security issue in 2.0.11 HOT 2
- Exception is thrown during lines drawing HOT 1
- License HOT 2
- README wrongly documents split-pdf-at HOT 4
- COSStream has been closed and cannot be read. HOT 2
- Extract text from pdf area HOT 1
- Less strict `merge-pdfs` HOT 1
- Upgrade to pdfbox 3 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdfboxing.