Comments (18)
You should be able to do this for the simple case with this code:
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
p._p = p._element = None
Any subsequent access to the "deleted" paragraph
object will raise AttributeError
, so you should be careful not to keep the reference hanging around, including as a member of a stored value of Document.paragraphs.
The reason it's not in the library yet is because the general case is much trickier, in particular needing to detect and handle the variety of linked items that can be present in a paragraph; things like a picture, a hyperlink, or chart etc.
But if you know for sure none of those are present, these few lines should get the job done.
from python-docx.
It depends a little on what you mean by link, but deleting is not so much a problem in practice as copying is.
If you have a hyperlink, for example, in a paragraph, that hyperlink element in the XML contains a relationship reference (like "rId7") to a Relationship
element in the .rels
"file" associated with the part containing the paragraph (maybe the document-part most commonly). That Relationship
element contains the URL of the hyperlink and that's the extent of the relationship (a so-called "external" relationship). If you delete the paragraph but don't delete the Relationship
element in the .rels
collection that Relationship
element will hang around and be saved with the document. This actually shouldn't cause a problem and I don't believe by itself represents a file "corruption" that might give rise to a so-called "repair error" when opening the file.
If you have something "bigger", like say an image embedded in the paragraph (a so-called inline-shape), and you delete the paragraph without attending to the now-dangling relationship, then both the Relationship
element in the .rels
_as well as the Image-part it refers to will be retained in the document. That bloats the file a little but again, shouldn't cause a problem and may or may not give rise to a "repair-error" on opening the document. You'd have to experiment and behavior might vary by client, like maybe PowerPoint doesn't complain but LibreOffice does or vice-versa.
So deleting a paragraph is worth trying if you don't mind a little wasted space.
But if you copy a paragraph and don't re-establish the relationships (which may need to change "name", e.g. "rId7" -> "rId9") and also copy over target part(s) (e.g. the image in the example above) then that will definitely trigger a repair error on loading the document because Word can't find the image to render in that paragraph.
from python-docx.
Would like to see this available for python-docx. It would be very useful in populating a document full of placeholders given that it would allow the placeholder paragraph to be deleted if the value to populate the placeholder is None.
from python-docx.
That works! Thank you!!
from python-docx.
Glad it worked out Jeff :)
from python-docx.
Steve, thanks so much. I was having trouble after merging cells in a table which left extra empty paragraphs. Used your function and worked great, which let the cells shrink back by getting rid of empty space. Used it in a nested loop as follows:
delete_paragraph(table.rows[rx].cells[cx].paragraphs[-1])
thanks - wayne (retired HW designer, having fun with python while hopefully helping out the non-profit I volunteer for)
from python-docx.
Hi @scanny
Why not implement the feature and close the issue?
from python-docx.
You should be able to do this for the simple case with this code:
def delete_paragraph(paragraph): p = paragraph._element p.getparent().remove(p) p._p = p._element = NoneAny subsequent access to the "deleted"
paragraph
object will raiseAttributeError
, so you should be careful not to keep the reference hanging around, including as a member of a stored value of Document.paragraphs.The reason it's not in the library yet is because the general case is much trickier, in particular needing to detect and handle the variety of linked items that can be present in a paragraph; things like a picture, a hyperlink, or chart etc.
But if you know for sure none of those are present, these few lines should get the job done.
What's the difference compared to this solution?
def delete_element(el):
el._element.getparent().remove(el._element)
from python-docx.
Well, in fact, on review, there is an error in that code. The last line should be:
paragraph._p = paragraph._element = None
But as for the rest of it:
-
delete_element
andel
are misleading name choices in my view. AParagraph
object is an element-proxy object which composes an element object; it is not itself an element. So in general we reserve the nameelement
and its derivatives for the XML element objects themselves. -
The core code is essentially the first two lines combined into one, so that's a matter of taste; the operation is the same. I would personally probably choose something like yours in my own code, but for someone learning, sometimes breaking things down more step-by-step eases figuring out what the underlying process is, like first get the element from the proxy, then do this thing with the element, etc.
-
The (previously incorrect) last line is setting the
_p
and_element
attributes of the "host"Paragraph
proxy object toNone
so the now-deleted (or actually only orphaned) element is not accidentally accessed in later code and also is freed up for garbage collection. Removing an element inlxml
does not delete it, it only breaks its relationship with its parent. So the originalParagraph
object could still make changes to it and the user might puzzle for quite a while to figure out why their code wasn't working but wasn't raising an error. So you can think of it as preventative medicine.
from python-docx.
Thanks for this @scanny
I suggest you to edit the original previously incorrect last line, because that's the answer which is still linked by you from Stackoverflow.
from python-docx.
Steve, thanks so much. I was having trouble after merging cells in a table which left extra empty paragraphs. Used your function and worked great, which let the cells shrink back by getting rid of empty space. Used it in a nested loop as follows:
delete_paragraph(table.rows[rx].cells[cx].paragraphs[-1])thanks - wayne (retired HW designer, having fun with python while hopefully helping out the non-profit I volunteer for)
I have this same problem. However, when I use the delete_paragraph function with the corrected last line, the resulting document throws an error when opened that reads "Word found unreadable content in document_name.docx. Do you want to recover the contents of this document?" Clicking yes works to open the document, but I'm trying to figure out why deleting the paragraphs is causing this problem.
I think it might be related to the fact that this paragraph exists in a merged cell, but it sounds like @waynerth didn't experience this problem.
Any thoughts?
Thanks for your work on this @scanny!
from python-docx.
@mrufsvold each cell must contain at least one block item, so a paragraph or a table. If you get rid of all the paragraphs, that leaves the cell in an invalid state. You might want to delete paragraphs[1:]
or something like that, just be sure there's at least one left.
from python-docx.
@scanny That makes complete sense! Thanks for your quick reply. I'll give that a shot when I get back to that project!
from python-docx.
It worked!
from python-docx.
Glad you got it working @mrufsvold :)
from python-docx.
The reason it's not in the library yet is because the general case is much trickier, in particular needing to detect and handle the variety of linked items that can be present in a paragraph; things like a picture, a hyperlink, or chart etc.
@scanny Does that mean that if I delete a paragraph containing a link, my document will/might crash because the linked stuff is still kept/referenced somewhere else in the document ... or something alike?
from python-docx.
I think deleting is working for me, at least for the tests I made with many small controlled documents.
Now with a big document (where I do lots of things, not just deleting paragraphs) I am getting errors when opening it.
Word gives the chance to correct them and save the document, but I wonder if I have any chances of finding out the error source:
- Do you know of any way to make Word report where the "unreadable content" is?
I tried opc-diag but the output is so huge I can't really see anything there (BTW, no diff colours, just black and white interface: probably not designed for my Windows 7 machine?) - Reading again your last comment, I wonder what you exactly mean with copying a paragraph. Could you post a simple code example? (maybe I am unconsciously doing it since I reuse quite a few functions made by some other people).
Thanks @scanny
from python-docx.
Wow, thank you. It works!!!
from python-docx.
Related Issues (20)
- How to serialize a CT_Tbl object? HOT 6
- The cell's vMerge attribute may be incorrect HOT 1
- Highlight particular word in python docx HOT 1
- support more keys in nsmap. HOT 1
- pip Install python-docx==1.1.1 raise error in python 3.12, ERROR: Failed building wheel for lxml<=4.9.2,>=3.1.0 (in mac os) HOT 8
- DocumentPart' object has no attribute '_rels'. HOT 1
- Non compatibility of new update 1.1.1 with python-docx-template HOT 6
- Remove "generated by python-docx" from description tag HOT 5
- track-changes in python-docx HOT 5
- `doc.paragraphs` seems not including contents inside a `<mc:AlternateContent>` tag HOT 2
- [Feature] Support EMF image
- customXML Error HOT 4
- Chinese fonts Only the non-Chinese parts are valid
- Can not read an empty docx...please fix it.
- 打开空的docx文档时报错
- Inline support for SVG file stream HOT 3
- How to add internal top and bottom table cell spacings?
- Inserting a new page before last page in the word document using python HOT 1
- Auto refresh Table of Contents using docx HOT 1
- OSS-Fuzz Integration
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-docx.