I have managed to kludge it enough for my purposes. But my changes aren't the "right"

As per metadata editor: there is batch utility <code class="no

Having a bit of trouble getting PDFoutline example to run about pymupdf-utilities HOT 9 CLOSED

pymupdf commented on August 11, 2024

Having a bit of trouble getting PDFoutline example to run

from pymupdf-utilities.

Comments (9)

JorjMcKie commented on August 11, 2024

Thanks for caring!
I will need to have a look also. Haven't tried that script or any oth wx GUI for a time now.
Which wxPython version did you use?

from pymupdf-utilities.

rpbissonnette commented on August 11, 2024

Some version infos:

wx.version() '4.1.1 gtk3 (phoenix) wxWidgets 3.1.5'
fitz.VersionBind '1.18.17'

I am using KDE neon 5.22 which is Ubuntu 20.4 LTS under the hood.

Comparing version numbers ought to have a nice canonical solution. I just don't know what it is.

Looking at more of your code, I see you have already fixed the time.clock() problem.

pixmap.SamplesRBG() no doubt has got something to do with the presence/absence of an alpha channel.

I haven't the faintest why the le_szr.RecalcSizes() call fails - but since my real target is Qt I didn't look further...

You have so many demos and examples, I can see keeping them all up-to-date is non-trivial.

from pymupdf-utilities.

JorjMcKie commented on August 11, 2024

Thank you. I have just uploaded a fixed version.
Need to do a bit more of reworking though, because there happens an exception on the C++ level, when one click on the row lable. But the intended function (like duplicating row or so) still works - weird.
Otherwise it should work.
It did at least when trying it on a small file ...

Should you ever have a working Qt version: please kindly publish it here via a PR.

from pymupdf-utilities.

rpbissonnette commented on August 11, 2024

It must work for you and Mostly it works for me.

But I still get the no such attribute error on these lines:

if spad.doc.needs_pass or spad.doc.openErrCode:

I am using

Python 3.8.10 (default, Sep 28 2021, 16:10:42)

and VS Code

Version: 1.60.2
Commit: 7f6ab5485bbc008386c4386d08766667e155244e
Date: 2021-09-22T12:01:43.795Z
Electron: 13.1.8
Chrome: 91.0.4472.164
Node.js: 14.16.0
V8: 9.1.269.39-electron.0
OS: Linux x64 5.11.0-37-generic

For now I just delete that condition, but there must be a real reason buried deep inside the spad.doc? Can you think of anything I should check here?

Thanks.

from pymupdf-utilities.

JorjMcKie commented on August 11, 2024

Sorry, there were two places I forgot to change:
Since a few versions, there is a new indicator, which lets you check, whether a PDF can be saved incrementally (i.e. back to the same file with just appended delta information).
There are more conditions preventing this than just a repaired or decrypted file.
Strictly speaking, a decrypted file now could even be handled, but never mind ...

Uploaded the once again changed script.
Thanks for your patience!

from pymupdf-utilities.

rpbissonnette commented on August 11, 2024

Aha, I see that you had already fixed all the other examples from openErrCode to can_save_incrementally().
Perhaps PDFoutline is now all fixed too.
My use case is simple, I want to add metadata to random downloaded pdfs before entering them into Calibre.
(Calibre has a nice metadata editor, but it only modifies its own database.)
So I want to be able to copy and paste from the pdf text to the metadata fields. If it seems easy enough and worth your time, perhaps you could extend the right side view? For me, copy and paste is more valuable than the pixmap image.

As an aside, I didn't understand how you do do the TOC editing.
Eventually, as in one minute ago, I stumbled into the double click on Height and then interactively moving the red indicator line. That's great.

from pymupdf-utilities.

JorjMcKie commented on August 11, 2024

As per metadata editor:

there is batch utility csv2meta.py which lets you do this fairly comfortably.
I don't understand what mean by "copy/paste from the right side ... ". Probably worthwhile looking at PySimplGUI, which supports super simple GUI making and can do that with tkinter as well. As a demo ther is a docu browser in that same repo, which can display all types of supported documents (not just PDF) including zooming and such.

from pymupdf-utilities.

rpbissonnette commented on August 11, 2024

yes, csv2meta.py does a simple job - thanks to mupdf and pymupdf doing the heavy lifting!
But - where does the csv data file come from? Perhaps copy and paste from an open Acrobat screen into an Excel spreadsheet?
Sorry my hand-waving about copy/paste didn't make sense.
Imagine that PDFoutline were modified something like this:

Replace

define the bitmap for the pdf image ...

    bmp = pdf_show(self, 1)
    self.PDFbild = wx.StaticBitmap(self, wx.ID_ANY, bmp, defPos, defSiz, wx.BORDER_NONE)
    # ... and add it to the vertical sizer

With something like this:

 call (     fitzcli.py gettext  'spad.file = infile'    )        # get text from page 1.  Usually that is enough for metadata
 put returned text into a wx.panel instead of wx.StaticBitmap
 modify self.PDFbild by a self.PDFtextBild and attach to ri_szr

Expected result would be a "good enough" text representation on the right hand panel for the human intelligence to find
metadata items and then paste them into the left side "metagrid".

In bad cases, sure it might be nice to see the pixel accurate bitmap of self.PDFbild , and maybe put the text window into the same forward/backward loop if just page 1 isn't enough.

So PDFoutline would have 3 panes, or the right side toggle between bmp and text displays...

Installing PySimplGUI failed on my machine, so for now I wouldn't think of that as a short cut for me. Changing your wx.python seems easier. (Fools rush in...)

from pymupdf-utilities.

JorjMcKie commented on August 11, 2024

yes, csv2meta.py does a simple job - thanks to mupdf and pymupdf doing the heavy lifting! But - where does the csv data file come from? Perhaps copy and paste from an open Acrobat screen into an Excel spreadsheet?

I think we're talking at cross purposes:
If your PDF does have metadata, you can just as easily offload to a CSV: there is another script that does this reverse thing. Or simply make a a trivial script like this snippet:

doc = fitz.open("yourfile.pdf")
for k,v in doc.metadata.items():
    print(k, "=", v)

and copy/paste from that output. If I do that from my collection of science magazines, it looks like that:

>>> for k,v in doc.metadata.items():
	print(k, "=", v)

	
format = PDF 1.4
title = Die Theorie von allem
author = Spektrum der Wissenschaft
subject = Wie lassen sich Quanten- und Relativitätstheorie vereinen?
keywords = Fruktose, Bibel-Archäologie, Ebola
creator = PDFoutline.py
producer = PyMuPDF, PyPDF2
creationDate = D:20151222025718-04'30'
modDate = D:20160712110307-04'00'
trapped = 
encryption = None
>>>

from pymupdf-utilities.

Having a bit of trouble getting PDFoutline example to run about pymupdf-utilities HOT 9 CLOSED

Comments (9)

define the bitmap for the pdf image ...

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent