When trying to unpack some 'both' MOBI files generated by calibre 3.24.2, I got a cras

Crash unpacking calibre-generated 'both' MOBI files about kindleunpack HOT 29 CLOSED

kevinhendricks commented on August 22, 2024

Crash unpacking calibre-generated 'both' MOBI files

from kindleunpack.

Comments (29)

dougmassay commented on August 22, 2024

The only way I can see for 'filename' to be unassigned at that point in mobi_k8proc.py (212), is for the preceding for-loop to be skipped entirely. The only way I can see that would happen is if fragcnt == 0.

Now to investigate why fragcnt might be zero when it never has been any other time.

Is this happening with any dual-mobi generated with calibre > 3.24.2? I'm looking for the easiest/quickest way to get a test mobi that exhibits the issue.

If I had a test file, the first thing I'd try is inserting:

filename = 'part%04d.xhtml' % cnt

immediately after line 182 in mobi_k8proc.py (cnt = 0) to see if that remedied the problem.

from kindleunpack.

dougmassay commented on August 22, 2024

Oops. That will just make 'aidtext' the local variable being referenced before assignment. Need to find a way to assign it before the 'for i in range(fragcnt)' loop as well.

from kindleunpack.

kevinhendricks commented on August 22, 2024

Then something is pretty broken in calibre's dual mobi generation. By definition an azw3 must have a skeleton table and table of fragments to insert. If you build the book as a single azw3 in calibre and unpack it, do you still get this error?

This is not optional information that is missing. So we will just crash someplace else due to missing information. I guess we could simply check for the existence of both the skeleton table and the fragments and do a formal abort with a message that the kf8/azw3 portion of the dual mobi is corrupt.

Is there some way we can get the epub source and then use calibre and kindlegen to create our own tests to make sure a new or unknown segment has not been added

from kindleunpack.

kevinhendricks commented on August 22, 2024

I guess based on the line number of the error, it does have a skeleton table but one as Doug said with no fragments to insert into the existing skeleton. I guess we could just test for a skeleton with no fragments (fragcnt of 0) and either try to patch things up or abort at that point.

from kindleunpack.

kevinhendricks commented on August 22, 2024

By any chance, does the original epub have an empty or nearly empty xhtml file someplace in the spine?

from kindleunpack.

kevinhendricks commented on August 22, 2024

We should test for fragcnt of 0 for any skeleton and then init the filename as Doug said, and then hope the aidtext is not needed later and therefore set it to the null string or same placeholder string. You could then test if that "fixes" the issue.

from kindleunpack.

pdurrant commented on August 22, 2024

I could email the problematic ePub to you. I hadn't thought it was just that ePub, although it might be. It was split from an Omnibus edition using the ePub Split plugin, and then tidied up a little in the editor.

But I don't seem to have your current email addresses - mine is the same as ever.

from kindleunpack.

dougmassay commented on August 22, 2024

We should test for fragcnt of 0 for any skeleton and then init the filename as Doug said, and then hope the aidtext is not needed later and therefore set it to the null string or same placeholder string. You could then test if that "fixes" the issue.

We should be able to init the aidtext variable before the frag-walk for-loop with the info we have (if it exists). That may not help if it's empty and needed elsewhere, but it would avoid it being referenced before it was assigned.

Something to the effect of:

aidtext = self.fragtbl[fragptr][1][12:-2]

guaranteed to be "something" with a try/except clause might work (just before the "for i in range(fragcnt):").

from kindleunpack.

dougmassay commented on August 22, 2024

Yes. The first xhtml file in the epub has a completely empty body. I believe this causes it to have no fragments.

The two quick changes I mentioned above allowed me to successfully extract the epub from the calibre-built dual-mobi. I didn't even try to catch any exceptions on that second line I added. I DID verify that I got the same error as Paul with the stock KindleUnpack.

I'm running version 3.38.1 of calibre, by the way. So the creation of the problematic dual mobi is still present in the latest calibre.

from kindleunpack.

kevinhendricks commented on August 22, 2024

I can confirm this is an issue with the test case and it happens with the very first skeleton:

Here is the skeleton entry from the calibre generated azw3 file:

Skel Table:  7 entries
table: filenum, skeleton name, frag tbl record count, start position, length
[0, b'SKEL0000000000', 0, 0, 429]
[1, b'SKEL0000000001', 1, 429, 481]
[2, b'SKEL0000000002', 16, 1227, 581]
[3, b'SKEL0000000003', 21, 121143, 553]
[4, b'SKEL0000000004', 6, 280137, 553]
[5, b'SKEL0000000005', 24, 323294, 552]
[6, b'SKEL0000000006', 1, 509588, 284]

So for some reason there are no fragments to insert into the first skeleton which is the first thing in the spine. Now the question is why?

If I look at the rawml for this test case I see the following:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <title>MYFILLERHERE</title>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="kindle:flow:0001?mime=text/css" rel="stylesheet" type="text/css"/>
<link href="kindle:flow:0002?mime=text/css" rel="stylesheet" type="text/css"/>
</head>
  <body class="calibre" aid="0"></body>
</html>

This would end up being an empty xhtml file if there are no fragments to be inserted.

So something is very strange here.

The second skeleton looks like the following:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
  <head>
    <title>MYFILLERHERE</title>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="kindle:flow:0001?mime=text/css" rel="stylesheet" type="text/css"/>
<link href="kindle:flow:0002?mime=text/css" rel="stylesheet" type="text/css"/>
</head>
  <body class="calibre1" id="uKVuHBsnXbsjIId7yMuCqX2" type="bodymatter" aid="UGI0">
</body>
</html>

and the top part of the fragment table looks like:

Fragment Table: 69 entries
table: file position, link id text, file num, sequence number, start position, length
[894, b"P-//*[@aid='UGI0']", 1, 0, 0, 317]
[1774, b"P-//*[@aid='1T142']", 2, 1, 0, 7759]
[9533, b"P-//*[@aid='1T142']", 2, 2, 7759, 7689]
[17222, b"P-//*[@aid='1T142']", 2, 3, 15448, 8056]
[25278, b"P-//*[@aid='1T142']", 2, 4, 23504, 8128]
[33406, b"P-//*[@aid='1T142']", 2, 5, 31632, 7866]
[41272, b"P-//*[@aid='1T142']", 2, 6, 39498, 7992]

which confirms there is no fragment information to be inserted in the first skeleton.

I am going to look at the epub itself to see what may be causing calibre to do something like this.

from kindleunpack.

dougmassay commented on August 22, 2024

A patch in case people find it hard to read my mind. ;)

nofrags.patch.zip

from kindleunpack.

kevinhendricks commented on August 22, 2024

Ah! We crossed over. So kindlegen removes or otherwise handles this case so that every skeleton has at least 1 fragment but calibre does not.

Would you please change your patch to set the initial aidtext value to the string "0" instead as this is more correct than using a future aidtext value (at least in this case).

That should at least work around calibre's difference from kindlegen.

from kindleunpack.

kevinhendricks commented on August 22, 2024

The aidtext is actually both an internal id generated by kindlegen as well as a offset value. So setting a skeleton with no fragments to insert to have an aid value of "0" should be okay (I hope).

from kindleunpack.

dougmassay commented on August 22, 2024

Would you please change your patch to set the initial aidtext value to the string "0" instead as this is more correct than using a future aidtext value (at least in this case).

Sure thing!

So setting a skeleton with no fragments to insert to have an aid value of "0" should be okay (I hope).

I'd like to think we're only ever going to run into it on calibre generated dual-mobis made from epubs that have xhtml files with completely empty body sections. Should be pretty rare, I'd think.

Do you want to do some testing with the patch, or do you just want me to push?

nofrags.patch.zip

from kindleunpack.

kevinhendricks commented on August 22, 2024

The "issue" also happens with straight calibre converted .azw3 files so it might be more prevalent than we think.

Yes please push your latest patch.

from kindleunpack.

kevinhendricks commented on August 22, 2024

BTW, was there some reason for having a completely blank xhtml file at the beginning of the spine for this epub? Was it a leftover mistake or was there a "purpose" for it?

from kindleunpack.

dougmassay commented on August 22, 2024

Yes, I just realized that the dual-mobi factor has nothing to do with this. Not so rare, maybe.

Change pushed.

from kindleunpack.

kevinhendricks commented on August 22, 2024

Should we bump the version number?

from kindleunpack.

dougmassay commented on August 22, 2024

Probably. We may even want to tag it so people can download it easier.

from kindleunpack.

kevinhendricks commented on August 22, 2024

Also before closing this bug, we should try an epub with a cover page but with only 1 empty body xhtml file converted by calibre to see if calibre will generate a fragment table at all or even a skeleton table when converting to azw3. It may drop one or both and that would play havoc with KindleUnpack I think. Perhaps we should report this to Kovid, pointing out how kindlegen generates things from the same epub and showing him the difference in case he may care.

from kindleunpack.

dougmassay commented on August 22, 2024

It just occurred to me that the empty body just may be an artifact of the EpubSplit plugin.

pdurrant can you verify whether or not the xhtml file with an empty body was in the original epub omnibus before splitting it?

from kindleunpack.

kevinhendricks commented on August 22, 2024

Please go ahead and bump both the version number and tag it. We can then test the case of an epub only having empty body tags after the cover page and adjust kindleunpack to either work or abort in that case.

from kindleunpack.

dougmassay commented on August 22, 2024

It seems that calibre fails to convert an epub with a cover-image page and a single xhtml file with an empty body. The error message is long but it clearly ends with:

File "site-packages/calibre/ebooks/mobi/writer8/skeleton.py", line 212, in init
File "site-packages/calibre/ebooks/mobi/writer8/skeleton.py", line 393, in set_internal_links
ValueError: Could not find chunk for aid: '0'

So we may not have to worry about calibre generating something that might break kindleunpack (at least in regard to this particular issue).

from kindleunpack.

kevinhendricks commented on August 22, 2024

I just ran the same test and got the same thing! But that is surely a bug that Kovid will want to fix!

Either way, If you agree, I think we are safe to bump the version number and tag and make a release and then close this issue.

from kindleunpack.

dougmassay commented on August 22, 2024

Agreed.

Do we have any other places the version is designated other than the comments at the head of lib/kindleunpack.py? It's been a while.

from kindleunpack.

dougmassay commented on August 22, 2024

Never mind. I figured it out. Having a Senior moment apparently.

from kindleunpack.

pdurrant commented on August 22, 2024

BTW, was there some reason for having a completely blank xhtml file at the beginning of the spine for this epub? Was it a leftover mistake or was there a "purpose" for it?

Probably an artifact of being split out of an omnibus.

from kindleunpack.

dougmassay commented on August 22, 2024

Thanks for the testcase, Paul! If you see no potential problems with the way this was resolved, I'll close the issue.

from kindleunpack.

pdurrant commented on August 22, 2024

It sounded reasonable to me, although I haven't looked closely at the code. Feel free to close the issue.

from kindleunpack.

Crash unpacking calibre-generated 'both' MOBI files about kindleunpack HOT 29 CLOSED

Comments (29)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent