sfneal / pdfconduit Goto Github PK
View Code? Open in Web Editor NEWPrepare documents for distribution
License: Apache License 2.0
Prepare documents for distribution
License: Apache License 2.0
Running test_flatten.py yields a number of UnclosedResource warnings. Using the pdfrw method instead of pypdf3 during Merge call in Flatten seemed to limit the number of warnings but not eliminate them. PyPDF3 appears to be the origin of the problem.
Improve test coverage to near 100% (not necessary)
Consider converting the sub-module convert in the module pdf.conduit to a new module of the pdf namespace package (pdf.convert). Currently the only scripts are img2pdf.py and pdf2img.py which were built as the back-end of the Flatten class in flatten.py.
Possible new scripts...
Add type hinting across modules.
Analysis outputs from each test method and add assertions. The goal is to add more guardrails to prevent issues during pypdf conversion
The issue downstream issue in the pdfconduit api appears like it could be caused by a call to Info().metadata
as using that call in the test suite raises the same error seen in the api test suite.
Possibly caused by an issue with decrypting a PDF?
@Timer.decorator
def test_encrypt_128bit(self):
encrypted = Encrypt(self.pdf_path,
self.user_pw,
self.owner_pw,
output=self.temp.name,
bit128=True,
suffix='128bit')
security = self._getPdfSecurity(encrypted)
self.assertPdfExists(encrypted)
self.assertEncrypted(encrypted)
self.assert128BitEncryption(security)
self.assertSecurityValue(security, -1852)
print(Info(encrypted.output, self.user_pw).metadata) # No error
print(Info(encrypted, self.user_pw).metadata) # Error
# print(Info(encrypted).resources())
# print(Info(encrypted).security)
{'/Producer': 'pdfconduit', '/Creator': 'pdfconduit', '/Author': 'Stephen Neal'}
ETestEncrypt.test_encrypt_128bit_allow_commenting --> mms: 104.86
.TestEncrypt.test_encrypt_128bit_allow_printing --> mms: 108.78
.TestEncrypt.test_encrypt_128bit_allow_printing_and_commenting --> mms: 107.58
.TestEncrypt.test_encrypt_40bit --> mms: 99.45
.TestEncrypt.test_encrypt_40bit_allow_commenting --> mms: 98.56
.TestEncrypt.test_encrypt_40bit_allow_printing --> mms: 98.09
.TestEncrypt.test_encrypt_40bit_allow_printing_and_commenting --> mms: 103.74
.TestEncrypt.test_password_byte_string --> mms: 108.79
.
======================================================================
ERROR: test_encrypt_128bit (tests.test_conduit_encrypt.TestEncrypt.test_encrypt_128bit)
A nested function for timing other functions.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/looptools/timer.py", line 65, in function_timer
value = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/Stephen/Scripts/pdfconduit/tests/test_conduit_encrypt.py", line 44, in test_encrypt_128bit
print(Info(encrypted, self.user_pw).metadata) # Error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Stephen/Scripts/pdfconduit/pdfconduit/utils/info.py", line 7, in __init__
self.pdf = self._reader(path, password, prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Stephen/Scripts/pdfconduit/pdfconduit/utils/info.py", line 12, in _reader
pdf = PdfFileReader(path) if not isinstance(path, PdfFileReader) else path
^^^^^^^^^^^^^^^^^^^
File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/PyPDF3/pdf.py", line 1203, in __init__
self.read(stream)
File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/PyPDF3/pdf.py", line 1818, in read
stream.seek(-1, 2)
^^^^^^^^^^^
AttributeError: 'Encrypt' object has no attribute 'seek'
----------------------------------------------------------------------
Ran 9 tests in 0.992s
FAILED (errors=1)
Create a class (possibly a global?) that is capable of handling and generating output paths for any file type. It seems as if similar code is used for almost all of the pdf.conduit modules so that can likely be replaced with a class or function call. Further, not every module has the ability to specify an output directory at this time.
Hi,
I've installed pip install pdfconduit-gui
, but after this I get an error when I try to run on terminal from pdfconduit import GUI
This is the error from: can't read /var/mail/pdf.conduit
How can I fix this?
This project does not seem to support Chinese. Using Chinese watermarks will produce garbled characters.
Make each module of pdf.conduit callable as a script. Each module should be able to handle sys.args (if given) or launch a GUI window if no sys.args are provided.
If no args are provided and the GUI module is not installed, the user will be prompted to install the GUI module or call 'help' to view commands.
Refactor classes with multiple pdf driver methods to have usePdfrw()
& usePypdf()
methods instead of __init__
params.
PdfDriver
interfaceReplace use of PyPDF3 dependencies with currently maintained pypdf dependencies.
From my initial experimentation it seemed the majority of PyPDF3 uses can be replaced with pypdf without too much difficulty. Prior to this undertaking a few prerequisite steps should be taken to solidify the test suite.
After the prerequisites have been completed, a module by module conversion from PyPDF3 to pypdf should occur.
Update to latest syntax and update Tab calls
The documentation is severely behind development and needs to be updated.
Currently the pdfconduit-gui module is only setup for HPA watermarking procedures. It would be nice if its configuration (fields, default images, etc) could be customized upon install or upon call. Would make for a much more flexible package.
Possible config fields...
__all__
I noticed in your very impressive GUI code that the legacy setting of auto_size_text = True is still in the calls to FlexForm. The default is now true so you can remove those from your code (unless you called SetOption to override it).
It's fascinating to read code from other people that is calling my stuff. It's being used in ways I hadn't thought about before.
Note also that the latest code has the a setting for button_element_size. Buttons now default to (10,1) and have their own default setting you can change. Previously the default button size was the default_element_size, which is normally 30,1 or 45,1. This isn't an appropriate setting for any button.
When adding a watermark to an 11x17 sized PDF the watermark is placed in a slightly different location on each page. This could be due to page scaling or could be an issue with the watermark algo.
Noted a comment that tabs look ugly on a Mac.
Would it be possible to get a screenshot?
Do you think this is a tkinter problem specifically or is there something I can do to improve it?
The layered/flattened parameter in the Watermark.draw() method is causing the watermark to come out with all elements pushed to the center when creating a layered watermark. Likely due to a reportlabs based class handling of x, y, w or h values.
Add utils module test suite to the 2.x branch and then merge into master (effectively 3.x) to confirm functionality is unchanged.
Upgrade dependencies and run testing to confirm performance reliability.
Upgrade test logging with package version information (EnvInfo).
Possibly replace or improve the open_window function that reveals a file in Finder or Explorer with a function that opens a PDFviewer if the file is compatible.
Remove any temporary backwards compatibility added into the codebase while porting from PyPDF3 to pypdf
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.