Comments (3)
Hey @hderms, could you explain your use case a little further?
from docsplit.
I think perhaps I was being a little unclear. If one's primary goal is to split a PDF or PPT, or whatever into a bunch of sequential images via the Docsplit.extract_images method, then perhaps it would be useful to have that method return the paths of all the images extracted, rather than the paths of the PDFs from @pdfs in ImageExtractor. The PDF is merely an intermediate step, as far as I know, and returning the paths to the images extracted is probably more useful in most cases.
In my particular case, I have a Document model in rails, and I'd like it to have_many images. Programmatically receiving the paths to the images after extraction would be useful to me in setting up this relation.
from docsplit.
I have a patch that implements this functionality like so:
Docsplit.extract_images('/home/rvibe/Downloads/BadPresentations.ppt', :size => '1000x', :format => [:png, :jpg], :and_return => :images)
=> [["/home/rvibe/Lib/docsplit/BadPresentations_1.png", "/home/rvibe/Lib/docsplit/BadPresentations_2.png", "/home/rvibe/Lib/docsplit/BadPresentations_3.png", "/home/rvibe/Lib/docsplit/BadPresentations_4.png", "/home/rvibe/Lib/docsplit/BadPresentations_5.png", "/home/rvibe/Lib/docsplit/BadPresentations_6.png", "/home/rvibe/Lib/docsplit/BadPresentations_7.png"], ["/home/rvibe/Lib/docsplit/BadPresentations_1.jpg", "/home/rvibe/Lib/docsplit/BadPresentations_2.jpg", "/home/rvibe/Lib/docsplit/BadPresentations_3.jpg", "/home/rvibe/Lib/docsplit/BadPresentations_4.jpg", "/home/rvibe/Lib/docsplit/BadPresentations_5.jpg", "/home/rvibe/Lib/docsplit/BadPresentations_6.jpg", "/home/rvibe/Lib/docsplit/BadPresentations_7.jpg"]]
Without using the :and_return=> :images option, the default behavior is preserved:
Docsplit.extract_images('/home/rvibe/Downloads/BadPresentations.ppt', :size => '1000x', :format => [:png, :jpg])
=> ["/tmp/docsplit/BadPresentations.pdf"]
from docsplit.
Related Issues (20)
- Percent sign in filenames isn't escaped properly
- "undefined method `strip' for nil:NilClass" occurs when attempting "Docsplit.extract_pdf" HOT 8
- encoding issue HOT 1
- rails invalid byte sequence in UTF-8 HOT 1
- Horizontal / table formatted text
- Executable filename issue with latest version (5.0.4) of LibreOffice on RHEL HOT 1
- Can any one please tell me how to pass file path as url to Docsplit ? HOT 2
- Docsplit::TextExtractor#extract_text should return the path of the output text file? HOT 2
- Downsampling has gotten worse in the last year
- Error "MAGICK_TEMPDIR" no se reconoce como comando interno o externo.
- Docsplit.extract_text auto orientation detection 'detect_orientation: true' param does not work.
- Email address contains more than three special chars(punctuation) is removed by Docsplit.clean_text method
- Different behavior on mac and linux
- diskspace leak when extracting text from pdf HOT 1
- Docsplit.extract_text generates a String with a null byte
- Docsplit.extract_images(path) => bin/rails: No such file or directory - file HOT 2
- Docsplit working on Dev, Staging server but not on Production.
- Docsplit::ExtractionFailed: gm convert: Unable to open file (/tmp/docsplit/58371.pdf) [No such file or directory]
- ruby 3.2 compatibility
- "Error: source file could not be loaded"
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docsplit.