Firstly, great work on this project.
I believe I've found a bug in /docx/parts/document.py, in function "next_id".
In some of the documents that I have been using python-docx with, it turns out that some of the IDs are non-numeric. For example, inserting a "print(id_str_lst)" at line 90 in the aforementioned file gives me:
['4', '_x0000_t202', 'Text Box 5', '7', 'Text Box 9', '9', 'Text Box 7', '8', 'Text Box 11', '6', 'Text Box 6', '10', '0', '1', '3', 'Group 4', 'AutoShape 3', '5', '0', '12', '0', '26', '0', '25', '0', '2', '0', '13', '1', '14', '1', '15', '1', '16', '1', '39', '0', '40', '0', '35', '0', '21', '21', '22', '22', '20', '0', '18', '1']
Thus, I would get a ValueError as soon as the second element in the list was processed with "int(id_str)".
I have implemented a workaround by modifying the code for the "next_id" function to the following, to perform a quick check to ensure the id is numeric prior to adding to the list of used IDs:
def next_id(self):
"""
The next available positive integer id value in this document. Gaps
in id sequence are filled. The id attribute value is unique in the
document, without regard to the element type it appears on.
"""
id_str_lst = self._element.xpath('//@id')
used_ids = []
for id_str in id_str_lst:
if id_str.isdigit():
used_ids.append(int(id_str))
for n in range(1, len(used_ids)+2):
if n not in used_ids:
return n
This appears to fix the problem for me.
This is the first time I've ever had input to an open source project, so I am not certain how to go about officially submitting this 'fix' to the repository, and surely a better programmer than I will have a more efficient fix. :-)
Thanks again, and I hope this helps.