Comments (9)
Thanks for caring!
I will need to have a look also. Haven't tried that script or any oth wx GUI for a time now.
Which wxPython version did you use?
from pymupdf-utilities.
Some version infos:
wx.version() '4.1.1 gtk3 (phoenix) wxWidgets 3.1.5'
fitz.VersionBind '1.18.17'
I am using KDE neon 5.22 which is Ubuntu 20.4 LTS under the hood.
Comparing version numbers ought to have a nice canonical solution. I just don't know what it is.
Looking at more of your code, I see you have already fixed the time.clock() problem.
pixmap.SamplesRBG() no doubt has got something to do with the presence/absence of an alpha channel.
I haven't the faintest why the le_szr.RecalcSizes() call fails - but since my real target is Qt I didn't look further...
You have so many demos and examples, I can see keeping them all up-to-date is non-trivial.
from pymupdf-utilities.
Thank you. I have just uploaded a fixed version.
Need to do a bit more of reworking though, because there happens an exception on the C++ level, when one click on the row lable. But the intended function (like duplicating row or so) still works - weird.
Otherwise it should work.
It did at least when trying it on a small file ...
Should you ever have a working Qt version: please kindly publish it here via a PR.
from pymupdf-utilities.
It must work for you and Mostly it works for me.
But I still get the no such attribute error on these lines:
if spad.doc.needs_pass or spad.doc.openErrCode:
I am using
Python 3.8.10 (default, Sep 28 2021, 16:10:42)
and VS Code
Version: 1.60.2
Commit: 7f6ab5485bbc008386c4386d08766667e155244e
Date: 2021-09-22T12:01:43.795Z
Electron: 13.1.8
Chrome: 91.0.4472.164
Node.js: 14.16.0
V8: 9.1.269.39-electron.0
OS: Linux x64 5.11.0-37-generic
For now I just delete that condition, but there must be a real reason buried deep inside the spad.doc? Can you think of anything I should check here?
Thanks.
from pymupdf-utilities.
Sorry, there were two places I forgot to change:
Since a few versions, there is a new indicator, which lets you check, whether a PDF can be saved incrementally (i.e. back to the same file with just appended delta information).
There are more conditions preventing this than just a repaired or decrypted file.
Strictly speaking, a decrypted file now could even be handled, but never mind ...
Uploaded the once again changed script.
Thanks for your patience!
from pymupdf-utilities.
Aha, I see that you had already fixed all the other examples from openErrCode to can_save_incrementally().
Perhaps PDFoutline is now all fixed too.
My use case is simple, I want to add metadata to random downloaded pdfs before entering them into Calibre.
(Calibre has a nice metadata editor, but it only modifies its own database.)
So I want to be able to copy and paste from the pdf text to the metadata fields. If it seems easy enough and worth your time, perhaps you could extend the right side view? For me, copy and paste is more valuable than the pixmap image.
As an aside, I didn't understand how you do do the TOC editing.
Eventually, as in one minute ago, I stumbled into the double click on Height and then interactively moving the red indicator line. That's great.
from pymupdf-utilities.
As per metadata editor:
- there is batch utility
csv2meta.py
which lets you do this fairly comfortably. - I don't understand what mean by "copy/paste from the right side ... ". Probably worthwhile looking at PySimplGUI, which supports super simple GUI making and can do that with tkinter as well. As a demo ther is a docu browser in that same repo, which can display all types of supported documents (not just PDF) including zooming and such.
from pymupdf-utilities.
-
yes, csv2meta.py does a simple job - thanks to mupdf and pymupdf doing the heavy lifting!
But - where does the csv data file come from? Perhaps copy and paste from an open Acrobat screen into an Excel spreadsheet? -
Sorry my hand-waving about copy/paste didn't make sense.
Imagine that PDFoutline were modified something like this:
Replace
define the bitmap for the pdf image ...
bmp = pdf_show(self, 1)
self.PDFbild = wx.StaticBitmap(self, wx.ID_ANY, bmp, defPos, defSiz, wx.BORDER_NONE)
# ... and add it to the vertical sizer
With something like this:
call ( fitzcli.py gettext 'spad.file = infile' ) # get text from page 1. Usually that is enough for metadata
put returned text into a wx.panel instead of wx.StaticBitmap
modify self.PDFbild by a self.PDFtextBild and attach to ri_szr
Expected result would be a "good enough" text representation on the right hand panel for the human intelligence to find
metadata items and then paste them into the left side "metagrid".
In bad cases, sure it might be nice to see the pixel accurate bitmap of self.PDFbild , and maybe put the text window into the same forward/backward loop if just page 1 isn't enough.
So PDFoutline would have 3 panes, or the right side toggle between bmp and text displays...
Installing PySimplGUI failed on my machine, so for now I wouldn't think of that as a short cut for me. Changing your wx.python seems easier. (Fools rush in...)
from pymupdf-utilities.
yes, csv2meta.py does a simple job - thanks to mupdf and pymupdf doing the heavy lifting! But - where does the csv data file come from? Perhaps copy and paste from an open Acrobat screen into an Excel spreadsheet?
I think we're talking at cross purposes:
If your PDF does have metadata, you can just as easily offload to a CSV: there is another script that does this reverse thing. Or simply make a a trivial script like this snippet:
doc = fitz.open("yourfile.pdf")
for k,v in doc.metadata.items():
print(k, "=", v)
and copy/paste from that output. If I do that from my collection of science magazines, it looks like that:
>>> for k,v in doc.metadata.items():
print(k, "=", v)
format = PDF 1.4
title = Die Theorie von allem
author = Spektrum der Wissenschaft
subject = Wie lassen sich Quanten- und Relativitätstheorie vereinen?
keywords = Fruktose, Bibel-Archäologie, Ebola
creator = PDFoutline.py
producer = PyMuPDF, PyPDF2
creationDate = D:20151222025718-04'30'
modDate = D:20160712110307-04'00'
trapped =
encryption = None
>>>
from pymupdf-utilities.
Related Issues (20)
- Suggestion for Jupyter notebooks HOT 3
- ModuleNotFoundError: No module named 'ParseTab' HOT 2
- export-toc script outputs invalid csv when bookmark entry has newline character
- anonymize.py raises UnicodeDecodeError HOT 2
- libcrypt.so.2: cannot open shared object file: No such file or directory HOT 3
- multi_column.py does not identify multiple columns in some cases HOT 3
- How to use fitz to delete tables in PDF? Need help, Thanks. HOT 1
- document rescale after replacing fonts HOT 1
- fitzcli.py open file as in-memory stream & parameter for printing instead of writing to .txt HOT 1
- multi_column.py errors with latest version of pymupdf HOT 4
- Use a template for documenting the examples
- Make sure the examples are PEP8 compliant HOT 1
- Merge the demo scripts into the examples folder HOT 2
- Update links in Read the Docs HOT 3
- Merge the conversion scripts into the examples folder
- Merge the image-replacement scripts into the examples folder
- Merge the font-replacement scripts into the examples folder
- Merge the text-extraction scripts into the examples folder
- Merge the textbox-extraction scripts into the examples folder
- Define a folder structure for the examples
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pymupdf-utilities.