The pdfconduit from sfneal

[Port: PyPDF3 to pypdf] utils module

Refactor test suite to use parameter table based tests

refactor test suite to use parameter tables

Unclosed Resource Warnings

Running test_flatten.py yields a number of UnclosedResource warnings. Using the pdfrw method instead of pypdf3 during Merge call in Flatten seemed to limit the number of warnings but not eliminate them. PyPDF3 appears to be the origin of the problem.

[Enhancements] encrypt.py - add support for algos & permissions

add parameters for different algos
add parameters for different permissions
add tests

Add tests for portions of codebase without code coverage

Improve test coverage to near 100% (not necessary)

code that is missing testing

upscale pypdf3
slice tempdir (maybe can remove)
convert module temp dirs

New module of namespace package

Consider converting the sub-module convert in the module pdf.conduit to a new module of the pdf namespace package (pdf.convert). Currently the only scripts are img2pdf.py and pdf2img.py which were built as the back-end of the Flatten class in flatten.py.

Possible new scripts...

pdf2docx.py
docx2pdf.py
html2pdf.py
csv2pdf.py
excel2pdf.py

[Enhancements] add type hinting

Add type hinting across modules.

Improve dependency constraints in setup.py

add better dependency constraints to prevent installing invalid versions

Remove use of optional GUI dependencies

remove all use of gui deps from codebase
remove gui parameters

Add more assertions to test suite

Analysis outputs from each test method and add assertions. The goal is to add more guardrails to prevent issues during pypdf conversion

[Port: PyPDF3 to pypdf] encrypt.py

Add comparison test files

Change or remove default watermark

change default watermarks to say 'pdfconduit' instead of 'HPA Design'

[Port: PyPDF3 to pypdf] conduit & modify modules

conduit
- lib
- watermark
- #74
- extract
modify
- canvas
- draw

Add allow/dont allow tests for printing & commenting

Debug issue with reading metadata from encrypted PDFs

The issue downstream issue in the pdfconduit api appears like it could be caused by a call to Info().metadata as using that call in the test suite raises the same error seen in the api test suite.

Possibly caused by an issue with decrypting a PDF?

run pdfconduit api test suite and compare error messages
see if errored call can be removed from api?
fix issue throwing the error

Code in question

Test suite (test_conduit_encrypt.py)

@Timer.decorator
def test_encrypt_128bit(self):
    encrypted = Encrypt(self.pdf_path,
                        self.user_pw,
                        self.owner_pw,
                        output=self.temp.name,
                        bit128=True,
                        suffix='128bit')

    security = self._getPdfSecurity(encrypted)

    self.assertPdfExists(encrypted)
    self.assertEncrypted(encrypted)
    self.assert128BitEncryption(security)
    self.assertSecurityValue(security, -1852)

    print(Info(encrypted.output, self.user_pw).metadata)  # No error
    print(Info(encrypted, self.user_pw).metadata)  # Error
    # print(Info(encrypted).resources())
    # print(Info(encrypted).security)

Error output

{'/Producer': 'pdfconduit', '/Creator': 'pdfconduit', '/Author': 'Stephen Neal'}
ETestEncrypt.test_encrypt_128bit_allow_commenting   --> mms: 104.86
.TestEncrypt.test_encrypt_128bit_allow_printing     --> mms: 108.78
.TestEncrypt.test_encrypt_128bit_allow_printing_and_commenting --> mms: 107.58
.TestEncrypt.test_encrypt_40bit                     --> mms: 99.45
.TestEncrypt.test_encrypt_40bit_allow_commenting    --> mms: 98.56
.TestEncrypt.test_encrypt_40bit_allow_printing      --> mms: 98.09
.TestEncrypt.test_encrypt_40bit_allow_printing_and_commenting --> mms: 103.74
.TestEncrypt.test_password_byte_string              --> mms: 108.79
.
======================================================================
ERROR: test_encrypt_128bit (tests.test_conduit_encrypt.TestEncrypt.test_encrypt_128bit)
A nested function for timing other functions.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/looptools/timer.py", line 65, in function_timer
    value = func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/tests/test_conduit_encrypt.py", line 44, in test_encrypt_128bit
    print(Info(encrypted, self.user_pw).metadata)  # Error
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/pdfconduit/utils/info.py", line 7, in __init__
    self.pdf = self._reader(path, password, prompt)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/pdfconduit/utils/info.py", line 12, in _reader
    pdf = PdfFileReader(path) if not isinstance(path, PdfFileReader) else path
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/PyPDF3/pdf.py", line 1203, in __init__
    self.read(stream)
  File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/PyPDF3/pdf.py", line 1818, in read
    stream.seek(-1, 2)
    ^^^^^^^^^^^
AttributeError: 'Encrypt' object has no attribute 'seek'

----------------------------------------------------------------------
Ran 9 tests in 0.992s

FAILED (errors=1)

Output path handling overhaul (all modules)

Create a class (possibly a global?) that is capable of handling and generating output paths for any file type. It seems as if similar code is used for almost all of the pdf.conduit modules so that can likely be replaced with a class or function call. Further, not every module has the ability to specify an output directory at this time.

How to get rid of this information "HPA Design ,Inc. ARCHITECTS

Error when trying to run

Hi,

I've installed pip install pdfconduit-gui, but after this I get an error when I try to run on terminal from pdfconduit import GUI

This is the error from: can't read /var/mail/pdf.conduit

How can I fix this?

This project does not seem to support Chinese？

This project does not seem to support Chinese. Using Chinese watermarks will produce garbled characters.

Create main() function for each module for sys.args or GUI call

Make each module of pdf.conduit callable as a script. Each module should be able to handle sys.args (if given) or launch a GUI window if no sys.args are provided.

If no args are provided and the GUI module is not installed, the user will be prompted to install the GUI module or call 'help' to view commands.

Finish CLI functionality

[Enhancements] refactor multi-driver classes to avoid `init` params for drivers

Refactor classes with multiple pdf driver methods to have usePdfrw() & usePypdf() methods instead of __init__ params.

[3.x] PyPDF3 to pypdf refactor

Replace use of PyPDF3 dependencies with currently maintained pypdf dependencies.

From my initial experimentation it seemed the majority of PyPDF3 uses can be replaced with pypdf without too much difficulty. Prior to this undertaking a few prerequisite steps should be taken to solidify the test suite.

Test suite prerequisites

Codebase prerequisites

PyPDF3 to pypdf porting

After the prerequisites have been completed, a module by module conversion from PyPDF3 to pypdf should occur.

Enhancements

Final checks

Refactor tests to reduce code duplication

add custom assertions that can be called to replace copy/pasta assertions found in many test methods

[Port: PyPDF3 to pypdf] convert module

remove progress bars
flatten
img2pdf
pdf2img

Updated GUI module PySimpleGUI dependency

Update to latest syntax and update Tab calls

Updated Documentation

The documentation is severely behind development and needs to be updated.

Refactor GUI module to handle custom configurations

Currently the pdfconduit-gui module is only setup for HPA watermarking procedures. It would be nice if its configuration (fields, default images, etc) could be customized upon install or upon call. Would make for a much more flexible package.

Possible config fields...

Header
Text fields
Fields type
Line 1
Line 2
Custom copyright

Add metadata to all pdfs by default

add pdfconduit metadata to all outputs
create reusable interface?
#52
add ability to specify default metadata via a config?

Optimize imports

run optimize imports command on all directories
go through each file and cleanup imports
cleanup init.py files __all__

auto_size_text no longer needed, generally speaking....

I noticed in your very impressive GUI code that the legacy setting of auto_size_text = True is still in the calls to FlexForm. The default is now true so you can remove those from your code (unless you called SetOption to override it).

It's fascinating to read code from other people that is calling my stuff. It's being used in ways I hadn't thought about before.

Note also that the latest code has the a setting for button_element_size. Buttons now default to (10,1) and have their own default setting you can change. Previously the default button size was the default_element_size, which is normally 30,1 or 45,1. This isn't an appropriate setting for any button.

Remove mentions of "HPA Design"

remove any use of hpa design in metadata
replace with pdfconduit metadata
#59
add tests to confirm metadata was added correctly

Inconsistent watermark placement on large size PDF docs

When adding a watermark to an 11x17 sized PDF the watermark is placed in a slightly different location on each page. This could be due to page scaling or could be an issue with the watermark algo.

Possible to get screenshot from Mac of PySimpleGUI tabs?

Noted a comment that tabs look ugly on a Mac.

Would it be possible to get a screenshot?

Do you think this is a tkinter problem specifically or is there something I can do to improve it?

Cleanup samples.py

cleanup samples.py
move to samples directory
add new use cases if needed

Layered watermarks CanvasObjects are being improperly placed.

The layered/flattened parameter in the Watermark.draw() method is causing the watermark to come out with all elements pushed to the center when creating a layered watermark. Likely due to a reportlabs based class handling of x, y, w or h values.

[Port: PyPDF3 to pypdf] transform module

merge
rotate
slice
upscale

[2.x] utils module test suite

Add utils module test suite to the 2.x branch and then merge into master (effectively 3.x) to confirm functionality is unchanged.

test suites

post steps

#81

make sure tests pass
check utils test suite coverage

[Enhancements] merge.py - add support appending & page position

add support and parameters for appending particular pdf pages at particular page positions
e.g. add page 7 from pdf2 after page 4 in pdf1

sfneal / pdfconduit Goto Github PK

pdfconduit's People

Contributors

Stargazers

Watchers

Forkers

pdfconduit's Issues