kevinhendricks / kindleunpack Goto Github PK

View Code? Open in Web Editor NEW

926.0 926.0 100.0 609 KB

python based software to unpack Amazon / Kindlegen generated ebooks

License: GNU General Public License v3.0

Python 98.85% HTML 1.15%

kindleunpack's People

Contributors

Stargazers

Watchers

Forkers

takeokt otoboku fherrera124 eli-schwartz mcxiaoke azawlocki leapfirst wangjun doctoralvarez lewis-liu willamin pedrolopes srivatsanhari jianfuli ebookcollections david-barrett hipnoizz shengdie ocrack otomazeli u20024804 alexfirefox clee lnrsoft trammel pushbox mmariani grokify measca wooodhead wwwwfw mwidomski jps-e guoyu07 dougmassay cybort mikpim01 zhu1979 lasthm alanqian dylnn b2b2244424 tsihyoung krazybug solertis liukuannn solos bygreencn ipadawan redreamality axa-ru markismus smartree peterg75 frankfoofoo marslovefree dfandrich favoriteprojects lehungio xingfanxia lol2ask lxngoddess5321 dasvergessen azawalich liaolintao markiv piginlove wondermu mcutools ababook melmaliacone emileberhard rubicon apuyuseng bopo rickvincent morrisxu polokk rohshall clach04 peter-tesla ashesofphoenix lengyuxuan billwyy kirovj liujuncn juliantao yangyubo areafather nakanishi123 spookymask wglnngt kianmeng anezih ieamegaeru baobaolong0201 dep4 louxiaxiaohei zoujuny

kindleunpack's Issues

python2and3 branch issue

Hey Kevin,

We still have an issue with the preference (ini) reading and writing when running the python2and3 branch code with a python 2.x interpreter. ConfigParser in python 2.x simply cannot be cajoled into writing non-ascii characters to an INI file. So the current python2and3 code fails (when using python 2) to write the ini file when any of the paths in the gui contain non-ascii characters.

The only way that I've found around this python 2.x limitation is by encoding the paths with the unicode_escape codec before RawConfigParser.write() and consequently decoding with the unicode_escape codec immediately following RawConfigParser.read (or readfp). This is what I've done in the master (python2-only) branch.

I'm open to suggestions on how to resolve this. I've struck out on a solution that would allow python2 and python3 to work using one shared ini file (that could have been created/modified with either interpreter). The only thing I can think of is If-Else-ing those path ini variables (unicode_escape for PY2 ... normal for PY3) when reading/saving the ini file. Which would mean either giving up on someone who might want to use both PY2 and PY3 with the same ini file; or it would mean writing/reading a different ini file for each python version. *.PY3.ini and *.PY2.ini. Neither option sounds very palatable to me.

The only other option I can think of would be to ditch ConfigParser entirely and save/restore preferences using JSON. I believe PY2's json lib can be convinced to write non-ascii characters.

Error on unpacking dictionary

Attempting to unpack GOTDict.mobi from http://keckrew.blogspot.co.uk/2013/06/game-of-thrones-kindle-dictionary.html yields:

Conversion Log 

Input Path = "/home/will/Downloads/GOTDict.mobi"
Output Path = "/home/will/local/KindleUnpack/got"
Epub Output Type Set To: ePub 2


Please Wait ...

Palm DB type: BOOKMOBI, 395 sections.
Unpacking a Mobipocket 6 book...
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: windows-1252
Title: Game of Thrones Dictionary
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: image00267.jpeg from section 267
Extracting image: image00268.jpeg from section 268
Extracting image: image00269.jpeg from section 269
Extracting image: image00270.jpeg from section 270
Extracting image: image00271.jpeg from section 271
Extracting image: image00272.jpeg from section 272
Extracting image: image00273.jpeg from section 273
Extracting image: image00274.jpeg from section 274
Extracting image: image00275.jpeg from section 275
Extracting image: image00276.jpeg from section 276
Extracting image: image00277.jpeg from section 277
Extracting image: image00278.jpeg from section 278
Extracting image: image00279.jpeg from section 279
Extracting image: image00280.jpeg from section 280
Extracting image: image00281.jpeg from section 281
Extracting image: image00282.jpeg from section 282
Extracting image: image00283.jpeg from section 283
Extracting image: image00284.jpeg from section 284
Extracting image: image00285.jpeg from section 285
Extracting image: image00286.jpeg from section 286
Extracting image: image00287.jpeg from section 287
Extracting image: image00288.jpeg from section 288
Extracting image: image00289.jpeg from section 289
Extracting image: image00290.jpeg from section 290
Extracting image: image00291.jpeg from section 291
Extracting image: image00292.jpeg from section 292
Extracting image: image00293.jpeg from section 293
Extracting image: image00294.jpeg from section 294
Extracting image: image00295.jpeg from section 295
Extracting image: image00296.jpeg from section 296
Extracting image: image00297.jpeg from section 297
Extracting image: image00298.jpeg from section 298
Extracting image: image00299.jpeg from section 299
Extracting image: image00300.jpeg from section 300
Extracting image: image00301.jpeg from section 301
Extracting image: image00302.jpeg from section 302
Extracting image: image00303.jpeg from section 303
Extracting image: image00304.jpeg from section 304
Extracting image: image00305.jpeg from section 305
Extracting image: image00306.jpeg from section 306
Extracting image: image00307.jpeg from section 307
Extracting image: image00308.jpeg from section 308
Extracting image: image00309.jpeg from section 309
Extracting image: image00310.jpeg from section 310
Extracting image: image00311.jpeg from section 311
Extracting image: image00312.jpeg from section 312
Extracting image: image00313.jpeg from section 313
Extracting image: image00314.jpeg from section 314
Extracting image: image00315.jpeg from section 315
Extracting image: image00316.jpeg from section 316
Extracting image: image00317.jpeg from section 317
Extracting image: image00318.jpeg from section 318
Extracting image: image00319.jpeg from section 319
Extracting image: image00320.jpeg from section 320
Extracting image: image00321.jpeg from section 321
Extracting image: image00322.jpeg from section 322
Extracting image: image00323.jpeg from section 323
Extracting image: image00324.jpeg from section 324
Extracting image: image00325.jpeg from section 325
Extracting image: image00326.jpeg from section 326
Extracting image: image00327.jpeg from section 327
Extracting image: image00328.jpeg from section 328
Extracting image: image00329.jpeg from section 329
Extracting image: image00330.jpeg from section 330
Extracting image: image00331.jpeg from section 331
Extracting image: image00332.jpeg from section 332
Extracting image: image00333.jpeg from section 333
Extracting image: image00334.jpeg from section 334
Extracting image: image00335.jpeg from section 335
Extracting image: image00336.jpeg from section 336
Extracting image: image00337.jpeg from section 337
Extracting image: image00338.jpeg from section 338
Extracting image: image00339.jpeg from section 339
Extracting image: image00340.jpeg from section 340
Extracting image: image00341.jpeg from section 341
Extracting image: image00342.jpeg from section 342
Extracting image: image00343.jpeg from section 343
Extracting image: image00344.jpeg from section 344
Extracting image: image00345.jpeg from section 345
Extracting image: image00346.jpeg from section 346
Extracting image: image00347.jpeg from section 347
Extracting image: image00348.jpeg from section 348
Extracting image: image00349.jpeg from section 349
Extracting image: image00350.jpeg from section 350
Extracting image: image00351.jpeg from section 351
Extracting image: image00352.jpeg from section 352
Extracting image: image00353.jpeg from section 353
Extracting image: image00354.jpeg from section 354
Extracting image: image00355.jpeg from section 355
Extracting image: image00356.jpeg from section 356
Extracting image: image00357.jpeg from section 357
Extracting image: image00358.jpeg from section 358
Extracting image: image00359.jpeg from section 359
Extracting image: image00360.jpeg from section 360
Extracting image: image00361.jpeg from section 361
Extracting image: image00362.jpeg from section 362
Extracting image: image00363.jpeg from section 363
Extracting image: image00364.jpeg from section 364
Extracting image: image00365.jpeg from section 365
Extracting image: image00366.jpeg from section 366
Extracting image: image00367.jpeg from section 367
Extracting image: image00368.jpeg from section 368
Extracting image: image00369.jpeg from section 369
Extracting image: image00370.jpeg from section 370
Extracting image: image00371.jpeg from section 371
Extracting image: image00372.jpeg from section 372
Extracting image: image00373.jpeg from section 373
Extracting image: image00374.jpeg from section 374
Extracting image: image00375.jpeg from section 375
Extracting image: image00376.jpeg from section 376
Extracting image: image00377.jpeg from section 377
Extracting image: image00378.jpeg from section 378
Extracting image: image00379.jpeg from section 379
Extracting image: image00380.jpeg from section 380
Extracting image: image00381.jpeg from section 381
Extracting image: image00382.jpeg from section 382
Extracting image: image00383.jpeg from section 383
Extracting image: image00384.jpeg from section 384
Extracting image: image00385.jpeg from section 385
Extracting image: image00386.jpeg from section 386
Extracting image: image00387.jpeg from section 387
Extracting image: image00388.jpeg from section 388
Extracting image: image00389.jpeg from section 389
Extracting image: image00390.jpeg from section 390
Unpacking raw markup language
Write ncx
Info: Document contains orthographic index, handle as dictionary

Parsing metaOrthIndex
orthIndexCount is 2
Read dictionary index data
Error: 2
Traceback (most recent call last):
  File "KindleUnpack.pyw", line 370, in unpackEbook
    kindleunpack.unpackBook(infile, outdir, apnxfile, epubversion, use_hd, dodump=dump, dowriteraw=writeraw, dosplitcombos=splitcombos)
  File "/home/will/local/KindleUnpack/lib/kindleunpack.py", line 919, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "/home/will/local/KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
    processMobi7(mh, metadata, sect, files, rscnames)
  File "/home/will/local/KindleUnpack/lib/kindleunpack.py", line 623, in processMobi7
    positionMap = dictSupport(mh, sect).getPositionMap()
  File "lib/mobi_dict.py", line 218, in getPositionMap
    assert len(tagMap[0x02]) == 1
KeyError: 2



Error: Unpacking Failed

Extract metadata for mobi and AZW3?

Is it possible to extract metadata from mobi/azw3(non protected) files? It would be greate to return these metadata (author/title/category) as an array.

AttributeError: ‘str‘ object has no attribute ‘decode‘

Conversion Log

Input Path = "/Volumes/Ventoy/ebook/程序员必读之软件架构.azw3"
Output Path = "/Users/****/ebook/my-epub"
Epub Output Type Set To: ePub 3

Please Wait ...

Palm DB type: BOOKMOBI, 253 sections.
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: 程序员必读之软件架构
EXTH Title: 程序员必读之软件架构
No compression
Unpacking images, resources, fonts, etc
Extracting image: cover00202.jpeg from section 202
Extracting image: thumb00203.jpeg from section 203
Extracting image: image00204.jpeg from section 204
Extracting image: image00205.jpeg from section 205
Extracting image: image00206.jpeg from section 206
Extracting image: image00207.jpeg from section 207
Extracting image: image00208.jpeg from section 208
Extracting image: image00209.jpeg from section 209
Extracting image: image00210.jpeg from section 210
Extracting image: image00211.jpeg from section 211
Extracting image: image00212.jpeg from section 212
Extracting image: image00213.jpeg from section 213
Extracting image: image00214.jpeg from section 214
Extracting image: image00215.jpeg from section 215
Extracting image: image00216.jpeg from section 216
Extracting image: image00217.jpeg from section 217
Extracting image: image00218.jpeg from section 218
Extracting image: image00219.jpeg from section 219
Extracting image: image00220.jpeg from section 220
Extracting image: image00221.jpeg from section 221
Extracting image: image00222.jpeg from section 222
Extracting image: image00223.jpeg from section 223
Extracting image: image00224.jpeg from section 224
Extracting image: image00225.jpeg from section 225
Extracting image: image00226.jpeg from section 226
Extracting image: image00227.jpeg from section 227
Extracting image: image00228.jpeg from section 228
Extracting image: image00229.jpeg from section 229
Extracting image: image00230.jpeg from section 230
Extracting image: image00231.jpeg from section 231
Extracting image: image00232.jpeg from section 232
Extracting image: image00233.jpeg from section 233
Extracting image: image00234.jpeg from section 234
Extracting image: image00235.jpeg from section 235
Extracting image: image00236.jpeg from section 236
Extracting image: image00237.jpeg from section 237
Extracting image: image00238.jpeg from section 238
Extracting image: image00239.jpeg from section 239
Extracting image: image00240.jpeg from section 240
Extracting image: image00241.jpeg from section 241
Extracting image: image00242.jpeg from section 242
Extracting image: image00243.jpeg from section 243
Extracting image: image00244.jpeg from section 244
Extracting image: image00245.jpeg from section 245
Extracting image: image00246.jpeg from section 246
Extracting image: image00247.jpeg from section 247
Extracting image: image00248.jpeg from section 248
Unpacking raw markup language
Processing ncx / toc
Error: 'str' object has no attribute 'decode'
Traceback (most recent call last):
File "/Users/zouwj/MyProjects/github/KindleUnpack/KindleUnpack.pyw", line 370, in unpackEbook
kindleunpack.unpackBook(infile, outdir, apnxfile, epubversion, use_hd, dodump=dump, dowriteraw=writeraw, dosplitcombos=splitcombos)
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/kindleunpack.py", line 947, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/kindleunpack.py", line 864, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/kindleunpack.py", line 531, in processMobi8
ncx_data = ncx.parseNCX()
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/mobi_ncx.py", line 80, in parseNCX
toctext = toctext.decode(self.mh.codec)
AttributeError: 'str' object has no attribute 'decode'

Error: Unpacking Failed

KindleUnpack_ReadMe.htm references incorrect Python version

The KindleUnpack_ReadMe.htm file has the following as the Python requirement:
The KindleUnpack program requires Python 2.6 or 2.7 to function properly.

Where the README.md has the following as the Python requirement:
The KindleUnpack program requires Python 2.7.X or Python 3.4 or later to function properly.

Convert `original-resolution` to `rendition:viewport`

Some fixed-layout KF8 books don't have the viewport meta tag within each page and, while there is viewport data in the file ("original-resolution" in EXTH), KindleUnpack only outputs

<meta name="original-resolution" content="1072x1448" />

It would be better if it could convert it into the rendition:viewport property:

<meta property="rendition:viewport">width=1072, height=1448</meta>

Indeed, it looks like it used to do that, but for it was commented out for some reason:

KindleUnpack/lib/mobi_opf.py

Lines 675 to 686 in c8be31a

 # according to epub3 spec about correspondence with Amazon 

 # if 'original-resolution' is provided it needs to be converted to 

 # meta viewport property tag stored in the <head></head> of **each** 

 # xhtml page - so this tag would need to be handled by editing each part 

 # before reaching this routine 

 # we need to add support for this to the k8html routine 

 # if 'original-resolution' in metadata.keys(): 

 # resolution = metadata['original-resolution'][0].lower() 

 # width, height = resolution.split('x') 

 # if width.isdigit() and int(width) > 0 and height.isdigit() and int(height) > 0: 

 # viewport = 'width=%s, height=%s' % (width, height) 

 # self.createMetaTag(self.exth_fixedlayout_metadata, 'rendition:viewport', viewport)

For more context, as well as link to an example file, see johnfactotum/foliate#940

Named character references are not allowed in ePub

Kindle seems to allow named character references , but ePub's xhtml is rejected, could you please replace it with a numeric character reference?

To fix this, we recommend that the XHTML file that contains use decimal (hexadecimal) notation instead.
  or  

In other cases it is recommended to use char notation.
× is ×.

Book title not escaped in NCX

The generated NCX file is broken if title of the book contains &.

crash on some mobi generated by kindlegen

Warning: RESC section length(12384bytes) does not match its size(12385bytes).
Extracting image: image01042.jpeg from section 1042
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Traceback (most recent call last):
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 968, in <module>
    sys.exit(main())
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 957, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 871, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 794, in process_all_mobi_headers
    processMobi7(mh, metadata, sect, files, imgnames)
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 572, in processMobi7
    ncx_data = ncx.parseNCX()
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_ncx.py", line 36, in parseNCX
    outtbl, ctoc_text = self.mi.getIndexData(self.ncxidx, "NCX")
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_index.py", line 29, in getIndexData
    ctocdict = self.readCTOC(cdata)
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_index.py", line 120, in readCTOC
    pos, ilen = getVariableWidthValue(txtdata, offset)
  File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_index.py", line 144, in getVariableWidthValue
    v = data[offset + consumed]
IndexError: string index out of range

err-mobi.zip

Crash unpacking calibre-generated 'both' MOBI files

When trying to unpack some 'both' MOBI files generated by calibre 3.24.2, I got a crash when it was in the middle of unpacking the KF8 section.

This might be because of an error in the calibre generated book, but we still shouldn't crash. Here's a log of two runs, once with the book as generated by KindleGen, and once with the book as generated by calibre comnversion (both from the calibre-generated epub).

./kindleunpack.py /Users/pdurrant/Desktop/Flying\ Colours\ -\ kindlegen\ both.mobi
KindleUnpack v0.81
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum [email protected]
Extensive Extensions and Improvements Copyright © 2009-2018
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 272 sections.
Unpacking a Combination M8/KF8 book...
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Flying Colours
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: cover00120.jpeg from section 120
Extracting image: image00122.jpeg from section 122
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Flying Colours
Huffdic compression
Unpacking images, resources, fonts, etc
Unpacking raw markup language
Processing ncx / toc
Building an epub-like structure
Building proper xhtml for each file
Building a cover page.
Warning: cover_page.xhtml already exists.
Building an opf for mobi8 using epub version: 2
Write K8 ncx
Creating an epub-like file
Completed

./kindleunpack.py /Users/pdurrant/Desktop/Flying\ Colours\ -\ calibre\ both.mobi
KindleUnpack v0.81
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum [email protected]
Extensive Extensions and Improvements Copyright © 2009-2018
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 268 sections.
Unpacking a Combination M8/KF8 book...
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Flying Colours
EXTH Title: Flying Colours
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: cover00120.jpeg from section 120
Extracting image: image00121.jpeg from section 121
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Flying Colours
EXTH Title: Flying Colours
Palmdoc compression
Unpacking images, resources, fonts, etc
Unpacking raw markup language
Traceback (most recent call last):
File "./kindleunpack.py", line 1019, in
sys.exit(main())
File "./kindleunpack.py", line 1007, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "./kindleunpack.py", line 922, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "./kindleunpack.py", line 839, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "./kindleunpack.py", line 478, in processMobi8
k8proc.buildParts(rawML)
File "/Users/pdurrant/Applications/KindleUnpack 64 v0.81.app/Contents/Resources/mobi_k8proc.py", line 212, in buildParts
self.partinfo.append([skelnum, 'Text', filename, skelpos, baseptr, aidtext])
UnboundLocalError: local variable 'filename' referenced before assignment
Paul:Resources pdurrant$

As far as I can tell, this is a general problem, not specific to this book.

Unpacked files size doubled

So I have a azw3 comic book file, size 31MB.
After unpack, the size of output folder size is 91MB, almost all images.
How did this happen?

Is there a way to pack it back? Thanks

Incorrect language codes

It seems that most of the region codes are wrong. E.g.

KindleUnpack/lib/mobi_utils.py

Line 36 in c8be31a

4 : {0 : 'zh' , 3 : 'zh-hk' , 2 : 'zh-cn' , 4 : 'zh-sg' , 1 : 'zh-tw'},

I think the keys should be multiples of 4? That is, 4, 8, 12, and 16 for zh-TW, zh-CN, zh-HK, zh-SG, respectively, not 1, 2, 3, 4.

The ones for Spanish seem to be the only ones that are correct:

KindleUnpack/lib/mobi_utils.py

Lines 97 to 99 in c8be31a

: {0 : 'es' , 4 : 'es' , 44 : 'es-ar' , 64 : 'es-bo' , 52 : 'es-cl' , 36 : 'es-co' , 20 : 'es-cr' , 28 : 'es-do' , 

: 'es-ec' , 68 : 'es-sv' , 16 : 'es-gt' , 72 : 'es-hn' , 8 : 'es-mx' , 76 : 'es-ni' , 24 : 'es-pa' , 

: 'es-py' , 40 : 'es-pe' , 80 : 'es-pr' , 56 : 'es-uy' , 32 : 'es-ve'},

Issues Unpacking .azw4 in Calibre

Hello,

I am having issues unpacking an .azw4 ebook.

I know the ebook was deDRM'd as the Calibre debug log stated it was successful. I am unable to unpack in Calibre, receiving almost the same memory message as the other user:

calibre, version 2.65.1
ERROR: KindleUnpack - The Plugin v0.81.2:

Traceback (most recent call last):
File "calibre_plugins.kindleunpack_plugin.action", line 267, in unpack_ebook
File "calibre_plugins.kindleunpack_plugin.mobi_stuff", line 123, in unpackMOBI
File "calibre_plugins.kindleunpack_plugin.kindleunpack.kindleunpack", line 869, in unpackBook
File "calibre_plugins.kindleunpack_plugin.kindleunpack.mobi_sectioner", line 50, in init
MemoryError

Following your advice on helping the other user, I downloaded Python 64 and the standalone Unpacker, but get this message:

Conversion Log

Input Path = "C:\Users\416\Documents\Calibre Library\Peter Raven\Biology, 10E, With Access Code For C (15)\Biology, 10E, With Access Code - Peter Raven.azw4"
Output Path = "C:\Users\416\Desktop\Calibre Decrypted\New folder"
WriteRawML = True
Epub Output Type Set To: Auto-detect
Use HD Images If Present = True

Please Wait ...

Error: Unpacking Failed

Here is the debug log from Calibre:

calibre Debug log
calibre 2.65.1 embedded-python: True is64bit: False
Windows-8-6.2.9200 Windows ('32bit', 'WindowsPE')
32bit process running on 64bit windows
('Windows', '8', '6.2.9200')
Python 2.7.9
Windows: ('8', '6.2.9200', '', 'Multiprocessor Free')
Successfully initialized third party plugins: DeDRM (6, 5, 1) && KindleUnpack - The Plugin (0, 81, 2)
Starting up...
Started up in 7.36 seconds with 3 books
stdout+stderr from file dialog helper: ['', '']
piped data from file dialog helper: ['\x02A\xce\x19\xa5\x8a\xa7^\x9b\x8d\x9f>tp\x14s\xb6a\xbb\x9e\xb5n\xa1Cz\xcb\xc8\xc3>\x8de\x8b', 'C:\Users\416\Documents\My Kindle Content\B00WN44PZM_EBOK.azw4']
DeDRM v6.5.1: Trying to decrypt B00WN44PZM_EBOK.azw4
Using Library AlfCrypto DLL/DYLIB/SO
MobiDeDrm v0.41.
Copyright © 2008-2012 The Dark Reverser et al.
MOBI header version 4, header length 248
Decrypting Mobipocket 4 ebook: Biology, 10E, With Access Code For Connect Plus
Found 4 keys to try after 1.0 seconds
Crypto Type is: 2
File is encoded with PID xCy04K2pVR.
Decrypting. Please wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done
Decryption succeeded after 16.9 seconds
DeDRM v6.5.1: Finished after 17.6 seconds
Added Biology, 10E, With Access Code For Connect Plus to db in: 0.5
Added 1 books in 18.7 seconds

Any insight you could offer would be greatly appreciated.

Thank you,

nix416

AttributeError: 'array.array' object has no attribute 'tostring' proposed fix

Hello!

I was trying to convert a .mobi dictionary file intended for an Amazon Kindle, in order to make a dictionary file for my Kobo Nia. I tried to use this utility but was receiving the following error:

Conversion Log 

Input Path = "D:\Programs\Dict\bulgarian_dictionary.mobi"
Output Path = "D:\Programs\Dict"
Epub Output Type Set To: Auto-detect


Please Wait ...

Palm DB type: BOOKMOBI, 4064 sections.
Unpacking a Mobipocket 7 book...
Processing Mobipocket 7 section of book...
Mobi Version: 7
Codec: utf-8
Title: Bulgarian Dictionary
Palmdoc compression
Unpacking images, resources, fonts, etc
File contains kindlegen source archive, extracting as kindlegensrc.zip
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Info: Document contains orthographic index, handle as dictionary

Parsing metaInflIndexData

Parsing metaOrthIndex
orthIndexCount is 17
orth entry uses ordt2 lookup table of type  0
Read dictionary index data
Error: 'array.array' object has no attribute 'tostring'
Traceback (most recent call last):
  File "D:\Programs\Dict\KindleUnpack-083\KindleUnpack.pyw", line 370, in unpackEbook
    kindleunpack.unpackBook(infile, outdir, apnxfile, epubversion, use_hd, dodump=dump, dowriteraw=writeraw, dosplitcombos=splitcombos)
  File "D:\Programs\Dict\KindleUnpack-083\lib\kindleunpack.py", line 932, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "D:\Programs\Dict\KindleUnpack-083\lib\kindleunpack.py", line 853, in process_all_mobi_headers
    processMobi7(mh, metadata, sect, files, rscnames)
  File "D:\Programs\Dict\KindleUnpack-083\lib\kindleunpack.py", line 630, in processMobi7
    positionMap = dictSupport(mh, sect).getPositionMap()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\Dict\KindleUnpack-083\lib\mobi_dict.py", line 205, in getPositionMap
    inflectionGroups = self.getInflectionGroups(text, inflectionControlByteCount, inflectionTagTable,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\Dict\KindleUnpack-083\lib\mobi_dict.py", line 288, in getInflectionGroups
    inflection = self.applyInflectionRule(mainEntry, data, offset+1, offset+1+textLength)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs\Dict\KindleUnpack-083\lib\mobi_dict.py", line 377, in applyInflectionRule
    return utf8_str(byteArray.tostring())
                    ^^^^^^^^^^^^^^^^^^
AttributeError: 'array.array' object has no attribute 'tostring'

I traced the error down to the mobi_dict.py file function, and replaced the .tostring() call with a .tobytes() and the conversion ran successfully.

I'm not a very gifted programmer, so I don't understand the full repercussions of what I did, but maybe you can judge if it would be something you'd like to implement.

mobi dictionary used:

bulgarian_dictionary.zip

Running Python 3.11.1

Thanks for the hard work!

Some Inflection data lost in the html file produced from kindle dictionary by KindleUnpack

Hi,

I unpacked Oxford Dictionary of English kindle dictionary with KindleUnpack(latest version), found there was not any Inflection data in idx:infl for the word "registered".

However, when I looked up "registered" with this dictionary in my kindle, the word "register" was matched.

There are more information for your reference: https://www.mobileread.com/forums/showthread.php?t=346226

Re-packing to .mobi

Once a .mobi has been unpacked and edited, is it possible to pack it up again into .mobi format?

Thanks

Fix simple typo: specifc -> specific

There is a small typo in lib/mobi_opf.py.
Should read specific rather than specifc.

Script throws AttributeError when parsing NCX navList

Not sure if you're even interested in coming back to this... but just for the record, I'm getting an AttributeError thrown when it tries to parse my NCX file for my fixed-layout book. I have a pageList in there acting as a sub-index for images/illustrations (which I got out of this part of the specification; Kindle Previewer 3 recognizes it). It looks something like this:

<!-- on same level as navMap toc -->
<navList>
  <navLabel><text>Images</text></navLabel>
  <navTarget id="i1">
    <navLabel><text>image caption</text></navLabel>
    <content src="pg1.html" />
  </navTarget>
  <navTarget id="i2">
    <navLabel><text>image caption</text></navLabel>
    <content src="pg3.html" />
  </navTarget>
  <navTarget id="i3">
    <navLabel><text>image caption</text></navLabel>
    <content src="pg7.html" />
  </navTarget>
  <!-- so on and so forth -->
</navList>

If this section is in the NCX file, the script throws the following error:

Traceback (most recent call last):
  File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module>
    sys.exit(main())
  File "KindleUnpack/lib/kindleunpack.py", line 1008, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
    processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
  File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8
    [junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':')
AttributeError: 'NoneType' object has no attribute 'split'

Here's how I'm running the script:

python KindleUnpack/lib/kindleunpack.py -s fixed.mobi tmp

And here's the full verbose console output on the off chance it's of use to you:

KindleUnpack v0.82
   Based on initial mobipocket version Copyright © 2009 Charles M. Hannum <[email protected]>
   Extensive Extensions and Improvements Copyright © 2009-2014 
       by:  P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 143 sections.
Unpacking a Combination M8/KF8 book...
First Image, last Image 42 72
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Malsagulo's Book
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00042.jpeg from section 42
Extracting image: image00043.jpeg from section 43
Extracting image: image00044.jpeg from section 44
Extracting image: image00045.jpeg from section 45
Extracting image: image00046.jpeg from section 46
Extracting image: image00047.jpeg from section 47
Extracting image: image00048.jpeg from section 48
Extracting image: image00049.jpeg from section 49
Extracting image: image00050.jpeg from section 50
Extracting image: image00051.jpeg from section 51
Extracting image: image00052.jpeg from section 52
Extracting image: image00053.jpeg from section 53
Extracting image: image00054.jpeg from section 54
Extracting image: image00055.jpeg from section 55
Extracting image: image00056.jpeg from section 56
Extracting image: image00057.jpeg from section 57
Extracting image: image00058.jpeg from section 58
Extracting image: image00059.jpeg from section 59
Extracting image: image00060.jpeg from section 60
Extracting image: image00061.jpeg from section 61
Extracting image: image00062.jpeg from section 62
Extracting image: image00063.jpeg from section 63
Extracting image: image00064.jpeg from section 64
Extracting image: image00065.jpeg from section 65
Extracting image: image00066.jpeg from section 66
Extracting image: image00067.jpeg from section 67
Extracting image: image00068.jpeg from section 68
Extracting image: cover00069.jpeg from section 69
Extracting image: image00071.jpeg from section 71
Extracting Page Map Information
File contains kindlegen source archive, extracting as kindlegensrc.zip
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Malsagulo's Book
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting Page Map Information
Unpacking raw markup language
Processing ncx / toc
Traceback (most recent call last):
  File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module>
    sys.exit(main())
  File "KindleUnpack/lib/kindleunpack.py", line 1008, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
    processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
  File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8
    [junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':')
AttributeError: 'NoneType' object has no attribute 'split'

Otherwise, thanks for the useful tool. It's been a big help -- and still will be, so long as I remember to comment out that part of the NCX. :)

Have considered package kindle ebooks parser into a pip package?

I searched PyPi library, pip search kindle and pip search mobi. But no lucky.

Then I found this project. I'm thinking, can this project separate out the kindle ebook formats parser out? So many people can benefit from it.

WDYT?

encoding issue with source code under Microsoft Windows and default codepage

Error when running under Python 2.7 under windows with command line to get usage/help output:

C:\code\py\KindleUnpack>py -2 lib\kindleunpack.py
KindleUnpack v0.83
Traceback (most recent call last):
  File "lib\kindleunpack.py", line 1029, in <module>
    sys.exit(main())
  File "lib\kindleunpack.py", line 964, in main
    print("   Based on initial mobipocket version Copyright -¬ 2009 Charles M. Hannum <[email protected]>")
  File "C:\python2764\lib\codecs.py", line 369, in write
    data, consumed = self.encode(object, self.errors)
  File "C:\python2764\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 49: character maps to <undefined>

Luckily not an issue under Python 3 :-)

I can think of two options to resolve:

remove non-ascii characters - e.g. common convention is to use (c) as alternative
document in the readme that the default code page of 437 cannot/should-not be used. E.g. something like:

If you would prefer a command-line interface, simply look inside KindleUnpack's "lib" folder for the KindleUnpack.py python program and its support modules. You should then be able to run KindleUnpack.py by the following command:

python kindleunpack.py [-r -s -d -h -i] [-p APNX_FILE] INPUT_FILE OUTPUT_FOLDER

NOTE Under Microsoft Windows, first issue:

chcp 1252

Any preferences? I'm happy to post a PR.

unpack awz4 is grayed out

so, it doesn't do anything. I installed apprentice alf's latest also.

NOTE: on calibre on my old MacBook Pro, awz4 is unpacked without any problem. But it's now working with new install on my new MacBook Pro. That's why I installed KindleUnpack.

UnicodeEncodeError

KindleUnpack v0.80
Traceback (most recent call last):
  File "kindleunpack.py", line 1020, in <module>
    sys.exit(main())
  File "kindleunpack.py", line 953, in main
    print("   Based on initial mobipocket version Copyright 漏 2009 Charles M. Hannum <[email protected]>")
UnicodeEncodeError: 'gbk' codec can't encode character u'\xa9' in position 49: illegal multibyte sequence

while when I switch to python 3.4, and other error occured.

Traceback (most recent call last):
  File "K:/Program/Development/pyDev/online/KindleUnpack/lib/kindleunpack.py", line 183, in <module>
    from .unpack_structure import fileNames
  File "K:/Program/Development/pyDev/online/KindleUnpack/lib\unpack_structure.py", line 163
    nzinfo.external_attr = 0o600 << 16L # make this a normal file
                                      ^
SyntaxError: invalid syntax

by the way, how can I convert the whole project to .exe? I've tried py2exe and pyinstaller, but both of them could not convert it correctly...
the following code didn't work for me...
pyinstaller -p lib -F lib\kindleunpack.py --hidden-import compatibility_utils --hidden-import unipath --hidden-import unpack_structure --hidden-import mobi_utils

when execute kindleunpack.exe, error occured....

ImportError: No module named mobi_utils
[9208] Failed to execute script kindleunpack

Unable to unpack azw file with .opf content

Hello!
I'm running:
py -3 lib/kindleunpack.py -r "Learning XML_ Creating Self-Describing Data_nodrm.azw" output_dir
but it just fails:

KindleUnpack v0.82
   Based on initial mobipocket version Copyright © 2009 Charles M. Hannum <[email protected]>
   Extensive Extensions and Improvements Copyright © 2009-2014
       by:  P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation, version 3.
Unpacking Book...
Traceback (most recent call last):
  File "lib/kindleunpack.py", line 1020, in <module>
    sys.exit(main())
  File "lib/kindleunpack.py", line 1008, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "lib/kindleunpack.py", line 873, in unpackBook
    raise unpackException('Invalid file format')
__main__.unpackException: Invalid file format

here is this file.
I don't know how it was packed, but i think contents in OEB format. Inside i found mention of *.opf file.

imghdr used in cover will be removed in python 3.13

https://github.com/kevinhendricks/KindleUnpack/blob/master/lib/mobi_cover.py#L11

From https://docs.python.org/3/library/imghdr.html

Deprecated since version 3.11, will be removed in version 3.13: The imghdr module is deprecated (see PEP 594 for details and alternatives).

how can i get the cover from a mobi file

Incorrect version displayed on screen in DumpMobiHeader_v023.py

In DumpMobiHeader_v023.py, the ouput is "DumDumpMobiHeader v022" rather than "DumpMobiHeader v023".

Line 629: print("DumpMobiHeader v022")

Memory Error

I am using v0.80 in Calibre and it appears that KU cannot handle large files. I have a 196mb azw4 and I get the following error:

except error: Processing ncx / toc

chcp 65001 > nul
python kindleunpack.py --epub_version=3 E:\Downloads\files.azw3 E:\Downloads\tmp

KindleUnpack v0.80
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum [email protected]
Extensive Extensions and Improvements Copyright © 2009-2014
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 13727 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: **古典文学荟萃(全36册)
EXTH Title: **古典文学荟萃【全36册】（精选先秦以来的优秀文学作品，涵盖诗、词、曲、小说、戏剧、随笔等体裁，权威版本，名家译注，文学爱好者案头必备丛书）
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: image11510.jpeg from section 11510
Extracting image: image11511.jpeg from section 11511
Extracting image: image11512.jpeg from section 11512
......
Extracting image: image13504.gif from section 13504
Extracting image: image13505.gif from section 13505
Extracting image: image13507.jpeg from section 13507
Warning: Section 13509 does not contain a recognised resource
Warning: Section 13510 does not contain a recognised resource
Warning: Section 13511 does not contain a recognised resource
......
Warning: Section 13710 does not contain a recognised resource
Warning: Section 13711 does not contain a recognised resource
Warning: Section 13712 does not contain a recognised resource
Unpacking raw markup language
Processing ncx / toc
Traceback (most recent call last):
File "kindleunpack.py", line 1016, in
sys.exit(main())
File "kindleunpack.py", line 1004, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 919, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "kindleunpack.py", line 836, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "kindleunpack.py", line 522, in processMobi8
ncx_data = ncx.parseNCX()
File ".\mobi_ncx.py", line 45, in parseNCX
outtbl, ctoc_text = self.mi.getIndexData(self.ncxidx, "NCX")
File ".\mobi_index.py", line 38, in getIndexData
ctocdict = self.readCTOC(cdata)
File ".\mobi_index.py", line 131, in readCTOC
pos, ilen = getVariableWidthValue(txtdata, offset)
File ".\mobi_index.py", line 157, in getVariableWidthValue
if ord(v) & 0x80:
TypeError: ord() expected a character, but string of length 0 found

	# according to epub3 spec about correspondence with Amazon
	# if 'original-resolution' is provided it needs to be converted to
	# meta viewport property tag stored in the <head></head> of each
	# xhtml page - so this tag would need to be handled by editing each part
	# before reaching this routine
	# we need to add support for this to the k8html routine
	# if 'original-resolution' in metadata.keys():
	# resolution = metadata['original-resolution'][0].lower()
	# width, height = resolution.split('x')
	# if width.isdigit() and int(width) > 0 and height.isdigit() and int(height) > 0:
	# viewport = 'width=%s, height=%s' % (width, height)
	# self.createMetaTag(self.exth_fixedlayout_metadata, 'rendition:viewport', viewport)

	10 : {0 : 'es' , 4 : 'es' , 44 : 'es-ar' , 64 : 'es-bo' , 52 : 'es-cl' , 36 : 'es-co' , 20 : 'es-cr' , 28 : 'es-do' ,
	48 : 'es-ec' , 68 : 'es-sv' , 16 : 'es-gt' , 72 : 'es-hn' , 8 : 'es-mx' , 76 : 'es-ni' , 24 : 'es-pa' ,
	60 : 'es-py' , 40 : 'es-pe' , 80 : 'es-pr' , 56 : 'es-uy' , 32 : 'es-ve'},