Git Product home page Git Product logo

pdfconduit's People

Contributors

dependabot[bot] avatar sfneal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

lxngoddess5321

pdfconduit's Issues

Unclosed Resource Warnings

Running test_flatten.py yields a number of UnclosedResource warnings. Using the pdfrw method instead of pypdf3 during Merge call in Flatten seemed to limit the number of warnings but not eliminate them. PyPDF3 appears to be the origin of the problem.

New module of namespace package

Consider converting the sub-module convert in the module pdf.conduit to a new module of the pdf namespace package (pdf.convert). Currently the only scripts are img2pdf.py and pdf2img.py which were built as the back-end of the Flatten class in flatten.py.

Possible new scripts...

  • pdf2docx.py
  • docx2pdf.py
  • html2pdf.py
  • csv2pdf.py
  • excel2pdf.py

[Enhancements] add type hinting

Add type hinting across modules.

  • watermark
  • encrypt
  • extract
  • flatten
  • img2pdf
  • pdf2img
  • canvas
  • draw
  • merge
  • rotate
  • slice
  • upscale
  • permissions
  • info
  • path
  • read
  • receipt
  • view
  • write

Add more assertions to test suite

Analysis outputs from each test method and add assertions. The goal is to add more guardrails to prevent issues during pypdf conversion

  • encrypt
  • watermark
  • flatten
  • img2pdf
  • pdf2img
  • merge
  • rotate
  • slice
  • upscale
  • review file outputs

[Port: PyPDF3 to pypdf] encrypt.py

  • remove progress bar params & functionality
  • add pypdf dependencies
  • port encrypt to use pypdf instead of pypdf3
  • #75
  • ensure all tests pass
  • compare output files

Add comparison test files

  • generate expected output, save file and do assertion to confirm contents are the same
  • encrypt
  • watermark
  • flatten
  • img2pdf
  • pdf2img
  • merge
  • rotate
  • slice
  • upscale

Debug issue with reading metadata from encrypted PDFs

The issue downstream issue in the pdfconduit api appears like it could be caused by a call to Info().metadata as using that call in the test suite raises the same error seen in the api test suite.

Possibly caused by an issue with decrypting a PDF?

  • run pdfconduit api test suite and compare error messages
  • see if errored call can be removed from api?
  • fix issue throwing the error

Code in question

Test suite (test_conduit_encrypt.py)

@Timer.decorator
def test_encrypt_128bit(self):
    encrypted = Encrypt(self.pdf_path,
                        self.user_pw,
                        self.owner_pw,
                        output=self.temp.name,
                        bit128=True,
                        suffix='128bit')

    security = self._getPdfSecurity(encrypted)

    self.assertPdfExists(encrypted)
    self.assertEncrypted(encrypted)
    self.assert128BitEncryption(security)
    self.assertSecurityValue(security, -1852)

    print(Info(encrypted.output, self.user_pw).metadata)  # No error
    print(Info(encrypted, self.user_pw).metadata)  # Error
    # print(Info(encrypted).resources())
    # print(Info(encrypted).security)

Error output

{'/Producer': 'pdfconduit', '/Creator': 'pdfconduit', '/Author': 'Stephen Neal'}
ETestEncrypt.test_encrypt_128bit_allow_commenting   --> mms: 104.86
.TestEncrypt.test_encrypt_128bit_allow_printing     --> mms: 108.78
.TestEncrypt.test_encrypt_128bit_allow_printing_and_commenting --> mms: 107.58
.TestEncrypt.test_encrypt_40bit                     --> mms: 99.45
.TestEncrypt.test_encrypt_40bit_allow_commenting    --> mms: 98.56
.TestEncrypt.test_encrypt_40bit_allow_printing      --> mms: 98.09
.TestEncrypt.test_encrypt_40bit_allow_printing_and_commenting --> mms: 103.74
.TestEncrypt.test_password_byte_string              --> mms: 108.79
.
======================================================================
ERROR: test_encrypt_128bit (tests.test_conduit_encrypt.TestEncrypt.test_encrypt_128bit)
A nested function for timing other functions.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/looptools/timer.py", line 65, in function_timer
    value = func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/tests/test_conduit_encrypt.py", line 44, in test_encrypt_128bit
    print(Info(encrypted, self.user_pw).metadata)  # Error
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/pdfconduit/utils/info.py", line 7, in __init__
    self.pdf = self._reader(path, password, prompt)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/pdfconduit/utils/info.py", line 12, in _reader
    pdf = PdfFileReader(path) if not isinstance(path, PdfFileReader) else path
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/PyPDF3/pdf.py", line 1203, in __init__
    self.read(stream)
  File "/Users/Stephen/Scripts/pdfconduit/venv/python312/lib/python3.12/site-packages/PyPDF3/pdf.py", line 1818, in read
    stream.seek(-1, 2)
    ^^^^^^^^^^^
AttributeError: 'Encrypt' object has no attribute 'seek'

----------------------------------------------------------------------
Ran 9 tests in 0.992s

FAILED (errors=1)

Output path handling overhaul (all modules)

Create a class (possibly a global?) that is capable of handling and generating output paths for any file type. It seems as if similar code is used for almost all of the pdf.conduit modules so that can likely be replaced with a class or function call. Further, not every module has the ability to specify an output directory at this time.

Error when trying to run

Hi,

I've installed pip install pdfconduit-gui, but after this I get an error when I try to run on terminal from pdfconduit import GUI

This is the error from: can't read /var/mail/pdf.conduit

How can I fix this?

Create main() function for each module for sys.args or GUI call

Make each module of pdf.conduit callable as a script. Each module should be able to handle sys.args (if given) or launch a GUI window if no sys.args are provided.

If no args are provided and the GUI module is not installed, the user will be prompted to install the GUI module or call 'help' to view commands.

Finish CLI functionality

  • add cli functionality to allow of the modules so that pdfs can be convert from the command line
  • watermark
  • encrypt
  • convert
    • img2pdf
    • pdf2img
  • modify
  • transform
    • merge
    • rotate
    • slice
    • upscale
  • utils

[3.x] PyPDF3 to pypdf refactor

Replace use of PyPDF3 dependencies with currently maintained pypdf dependencies.

From my initial experimentation it seemed the majority of PyPDF3 uses can be replaced with pypdf without too much difficulty. Prior to this undertaking a few prerequisite steps should be taken to solidify the test suite.

Test suite prerequisites

Codebase prerequisites

PyPDF3 to pypdf porting

After the prerequisites have been completed, a module by module conversion from PyPDF3 to pypdf should occur.

Enhancements

  • #75
  • #85
  • #88
  • [Enhancements] extract.py - build out test suite and fix as needed
  • #66
  • #100
  • #89
  • [Enhancements] refactor classes with tempdir params to use tempdir abstract class (similar to PdfDriver)
  • #90
  • update changelog workflow
  • outline new 'conduit' interface

Final checks

  • PyPDF3 removed from depenencies
  • deprecated PyPDF3 repo
  • all tests pass
  • decent code coverage
  • #54
  • refactor modules to use more consistent methods and structure
  • update documentation
  • move from setup.py to pyproject.toml?

Refactor GUI module to handle custom configurations

Currently the pdfconduit-gui module is only setup for HPA watermarking procedures. It would be nice if its configuration (fields, default images, etc) could be customized upon install or upon call. Would make for a much more flexible package.

Possible config fields...

  • Header
  • Text fields
  • Fields type
  • Line 1
  • Line 2
  • Custom copyright

Optimize imports

  • run optimize imports command on all directories
  • go through each file and cleanup imports
  • cleanup init.py files __all__

auto_size_text no longer needed, generally speaking....

I noticed in your very impressive GUI code that the legacy setting of auto_size_text = True is still in the calls to FlexForm. The default is now true so you can remove those from your code (unless you called SetOption to override it).

It's fascinating to read code from other people that is calling my stuff. It's being used in ways I hadn't thought about before.

Note also that the latest code has the a setting for button_element_size. Buttons now default to (10,1) and have their own default setting you can change. Previously the default button size was the default_element_size, which is normally 30,1 or 45,1. This isn't an appropriate setting for any button.

Cleanup samples.py

  • cleanup samples.py
  • move to samples directory
  • add new use cases if needed

Layered watermarks CanvasObjects are being improperly placed.

The layered/flattened parameter in the Watermark.draw() method is causing the watermark to come out with all elements pushed to the center when creating a layered watermark. Likely due to a reportlabs based class handling of x, y, w or h values.

[2.x] utils module test suite

Add utils module test suite to the 2.x branch and then merge into master (effectively 3.x) to confirm functionality is unchanged.

test suites

  • utils info
  • utils path
  • utils read
  • utils receipt
  • utils view
  • utils write

post steps

Remove pypdf3 backwards compatibility

Remove any temporary backwards compatibility added into the codebase while porting from PyPDF3 to pypdf

  • remove backwards compatibility from info.py
  • remove pdf driver methods from transform module
    • merge
    • rotate
    • upscale
    • slice
  • utils module - mainly info.py
  • watermark - add.py
  • remove pdf driver test methods
  • delete test outputs
  • change method checks to 'starts with pypdf' to allow a little backwards compatibility

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.