kevinhendricks / kindleunpack Goto Github PK
View Code? Open in Web Editor NEWpython based software to unpack Amazon / Kindlegen generated ebooks
License: GNU General Public License v3.0
python based software to unpack Amazon / Kindlegen generated ebooks
License: GNU General Public License v3.0
Hey Kevin,
We still have an issue with the preference (ini) reading and writing when running the python2and3 branch code with a python 2.x interpreter. ConfigParser in python 2.x simply cannot be cajoled into writing non-ascii characters to an INI file. So the current python2and3 code fails (when using python 2) to write the ini file when any of the paths in the gui contain non-ascii characters.
The only way that I've found around this python 2.x limitation is by encoding the paths with the unicode_escape codec before RawConfigParser.write() and consequently decoding with the unicode_escape codec immediately following RawConfigParser.read (or readfp). This is what I've done in the master (python2-only) branch.
I'm open to suggestions on how to resolve this. I've struck out on a solution that would allow python2 and python3 to work using one shared ini file (that could have been created/modified with either interpreter). The only thing I can think of is If-Else-ing those path ini variables (unicode_escape for PY2 ... normal for PY3) when reading/saving the ini file. Which would mean either giving up on someone who might want to use both PY2 and PY3 with the same ini file; or it would mean writing/reading a different ini file for each python version. *.PY3.ini and *.PY2.ini. Neither option sounds very palatable to me.
The only other option I can think of would be to ditch ConfigParser entirely and save/restore preferences using JSON. I believe PY2's json lib can be convinced to write non-ascii characters.
Attempting to unpack GOTDict.mobi from http://keckrew.blogspot.co.uk/2013/06/game-of-thrones-kindle-dictionary.html yields:
Conversion Log
Input Path = "/home/will/Downloads/GOTDict.mobi"
Output Path = "/home/will/local/KindleUnpack/got"
Epub Output Type Set To: ePub 2
Please Wait ...
Palm DB type: BOOKMOBI, 395 sections.
Unpacking a Mobipocket 6 book...
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: windows-1252
Title: Game of Thrones Dictionary
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: image00267.jpeg from section 267
Extracting image: image00268.jpeg from section 268
Extracting image: image00269.jpeg from section 269
Extracting image: image00270.jpeg from section 270
Extracting image: image00271.jpeg from section 271
Extracting image: image00272.jpeg from section 272
Extracting image: image00273.jpeg from section 273
Extracting image: image00274.jpeg from section 274
Extracting image: image00275.jpeg from section 275
Extracting image: image00276.jpeg from section 276
Extracting image: image00277.jpeg from section 277
Extracting image: image00278.jpeg from section 278
Extracting image: image00279.jpeg from section 279
Extracting image: image00280.jpeg from section 280
Extracting image: image00281.jpeg from section 281
Extracting image: image00282.jpeg from section 282
Extracting image: image00283.jpeg from section 283
Extracting image: image00284.jpeg from section 284
Extracting image: image00285.jpeg from section 285
Extracting image: image00286.jpeg from section 286
Extracting image: image00287.jpeg from section 287
Extracting image: image00288.jpeg from section 288
Extracting image: image00289.jpeg from section 289
Extracting image: image00290.jpeg from section 290
Extracting image: image00291.jpeg from section 291
Extracting image: image00292.jpeg from section 292
Extracting image: image00293.jpeg from section 293
Extracting image: image00294.jpeg from section 294
Extracting image: image00295.jpeg from section 295
Extracting image: image00296.jpeg from section 296
Extracting image: image00297.jpeg from section 297
Extracting image: image00298.jpeg from section 298
Extracting image: image00299.jpeg from section 299
Extracting image: image00300.jpeg from section 300
Extracting image: image00301.jpeg from section 301
Extracting image: image00302.jpeg from section 302
Extracting image: image00303.jpeg from section 303
Extracting image: image00304.jpeg from section 304
Extracting image: image00305.jpeg from section 305
Extracting image: image00306.jpeg from section 306
Extracting image: image00307.jpeg from section 307
Extracting image: image00308.jpeg from section 308
Extracting image: image00309.jpeg from section 309
Extracting image: image00310.jpeg from section 310
Extracting image: image00311.jpeg from section 311
Extracting image: image00312.jpeg from section 312
Extracting image: image00313.jpeg from section 313
Extracting image: image00314.jpeg from section 314
Extracting image: image00315.jpeg from section 315
Extracting image: image00316.jpeg from section 316
Extracting image: image00317.jpeg from section 317
Extracting image: image00318.jpeg from section 318
Extracting image: image00319.jpeg from section 319
Extracting image: image00320.jpeg from section 320
Extracting image: image00321.jpeg from section 321
Extracting image: image00322.jpeg from section 322
Extracting image: image00323.jpeg from section 323
Extracting image: image00324.jpeg from section 324
Extracting image: image00325.jpeg from section 325
Extracting image: image00326.jpeg from section 326
Extracting image: image00327.jpeg from section 327
Extracting image: image00328.jpeg from section 328
Extracting image: image00329.jpeg from section 329
Extracting image: image00330.jpeg from section 330
Extracting image: image00331.jpeg from section 331
Extracting image: image00332.jpeg from section 332
Extracting image: image00333.jpeg from section 333
Extracting image: image00334.jpeg from section 334
Extracting image: image00335.jpeg from section 335
Extracting image: image00336.jpeg from section 336
Extracting image: image00337.jpeg from section 337
Extracting image: image00338.jpeg from section 338
Extracting image: image00339.jpeg from section 339
Extracting image: image00340.jpeg from section 340
Extracting image: image00341.jpeg from section 341
Extracting image: image00342.jpeg from section 342
Extracting image: image00343.jpeg from section 343
Extracting image: image00344.jpeg from section 344
Extracting image: image00345.jpeg from section 345
Extracting image: image00346.jpeg from section 346
Extracting image: image00347.jpeg from section 347
Extracting image: image00348.jpeg from section 348
Extracting image: image00349.jpeg from section 349
Extracting image: image00350.jpeg from section 350
Extracting image: image00351.jpeg from section 351
Extracting image: image00352.jpeg from section 352
Extracting image: image00353.jpeg from section 353
Extracting image: image00354.jpeg from section 354
Extracting image: image00355.jpeg from section 355
Extracting image: image00356.jpeg from section 356
Extracting image: image00357.jpeg from section 357
Extracting image: image00358.jpeg from section 358
Extracting image: image00359.jpeg from section 359
Extracting image: image00360.jpeg from section 360
Extracting image: image00361.jpeg from section 361
Extracting image: image00362.jpeg from section 362
Extracting image: image00363.jpeg from section 363
Extracting image: image00364.jpeg from section 364
Extracting image: image00365.jpeg from section 365
Extracting image: image00366.jpeg from section 366
Extracting image: image00367.jpeg from section 367
Extracting image: image00368.jpeg from section 368
Extracting image: image00369.jpeg from section 369
Extracting image: image00370.jpeg from section 370
Extracting image: image00371.jpeg from section 371
Extracting image: image00372.jpeg from section 372
Extracting image: image00373.jpeg from section 373
Extracting image: image00374.jpeg from section 374
Extracting image: image00375.jpeg from section 375
Extracting image: image00376.jpeg from section 376
Extracting image: image00377.jpeg from section 377
Extracting image: image00378.jpeg from section 378
Extracting image: image00379.jpeg from section 379
Extracting image: image00380.jpeg from section 380
Extracting image: image00381.jpeg from section 381
Extracting image: image00382.jpeg from section 382
Extracting image: image00383.jpeg from section 383
Extracting image: image00384.jpeg from section 384
Extracting image: image00385.jpeg from section 385
Extracting image: image00386.jpeg from section 386
Extracting image: image00387.jpeg from section 387
Extracting image: image00388.jpeg from section 388
Extracting image: image00389.jpeg from section 389
Extracting image: image00390.jpeg from section 390
Unpacking raw markup language
Write ncx
Info: Document contains orthographic index, handle as dictionary
Parsing metaOrthIndex
orthIndexCount is 2
Read dictionary index data
Error: 2
Traceback (most recent call last):
File "KindleUnpack.pyw", line 370, in unpackEbook
kindleunpack.unpackBook(infile, outdir, apnxfile, epubversion, use_hd, dodump=dump, dowriteraw=writeraw, dosplitcombos=splitcombos)
File "/home/will/local/KindleUnpack/lib/kindleunpack.py", line 919, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "/home/will/local/KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
processMobi7(mh, metadata, sect, files, rscnames)
File "/home/will/local/KindleUnpack/lib/kindleunpack.py", line 623, in processMobi7
positionMap = dictSupport(mh, sect).getPositionMap()
File "lib/mobi_dict.py", line 218, in getPositionMap
assert len(tagMap[0x02]) == 1
KeyError: 2
Error: Unpacking Failed
Is it possible to extract metadata from mobi/azw3(non protected) files? It would be greate to return these metadata (author/title/category) as an array.
Conversion Log
Input Path = "/Volumes/Ventoy/ebook/程序员必读之软件架构.azw3"
Output Path = "/Users/****/ebook/my-epub"
Epub Output Type Set To: ePub 3
Please Wait ...
Palm DB type: BOOKMOBI, 253 sections.
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: 程序员必读之软件架构
EXTH Title: 程序员必读之软件架构
No compression
Unpacking images, resources, fonts, etc
Extracting image: cover00202.jpeg from section 202
Extracting image: thumb00203.jpeg from section 203
Extracting image: image00204.jpeg from section 204
Extracting image: image00205.jpeg from section 205
Extracting image: image00206.jpeg from section 206
Extracting image: image00207.jpeg from section 207
Extracting image: image00208.jpeg from section 208
Extracting image: image00209.jpeg from section 209
Extracting image: image00210.jpeg from section 210
Extracting image: image00211.jpeg from section 211
Extracting image: image00212.jpeg from section 212
Extracting image: image00213.jpeg from section 213
Extracting image: image00214.jpeg from section 214
Extracting image: image00215.jpeg from section 215
Extracting image: image00216.jpeg from section 216
Extracting image: image00217.jpeg from section 217
Extracting image: image00218.jpeg from section 218
Extracting image: image00219.jpeg from section 219
Extracting image: image00220.jpeg from section 220
Extracting image: image00221.jpeg from section 221
Extracting image: image00222.jpeg from section 222
Extracting image: image00223.jpeg from section 223
Extracting image: image00224.jpeg from section 224
Extracting image: image00225.jpeg from section 225
Extracting image: image00226.jpeg from section 226
Extracting image: image00227.jpeg from section 227
Extracting image: image00228.jpeg from section 228
Extracting image: image00229.jpeg from section 229
Extracting image: image00230.jpeg from section 230
Extracting image: image00231.jpeg from section 231
Extracting image: image00232.jpeg from section 232
Extracting image: image00233.jpeg from section 233
Extracting image: image00234.jpeg from section 234
Extracting image: image00235.jpeg from section 235
Extracting image: image00236.jpeg from section 236
Extracting image: image00237.jpeg from section 237
Extracting image: image00238.jpeg from section 238
Extracting image: image00239.jpeg from section 239
Extracting image: image00240.jpeg from section 240
Extracting image: image00241.jpeg from section 241
Extracting image: image00242.jpeg from section 242
Extracting image: image00243.jpeg from section 243
Extracting image: image00244.jpeg from section 244
Extracting image: image00245.jpeg from section 245
Extracting image: image00246.jpeg from section 246
Extracting image: image00247.jpeg from section 247
Extracting image: image00248.jpeg from section 248
Unpacking raw markup language
Processing ncx / toc
Error: 'str' object has no attribute 'decode'
Traceback (most recent call last):
File "/Users/zouwj/MyProjects/github/KindleUnpack/KindleUnpack.pyw", line 370, in unpackEbook
kindleunpack.unpackBook(infile, outdir, apnxfile, epubversion, use_hd, dodump=dump, dowriteraw=writeraw, dosplitcombos=splitcombos)
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/kindleunpack.py", line 947, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/kindleunpack.py", line 864, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/kindleunpack.py", line 531, in processMobi8
ncx_data = ncx.parseNCX()
File "/Users/zouwj/MyProjects/github/KindleUnpack/lib/mobi_ncx.py", line 80, in parseNCX
toctext = toctext.decode(self.mh.codec)
AttributeError: 'str' object has no attribute 'decode'
Error: Unpacking Failed
The KindleUnpack_ReadMe.htm file has the following as the Python requirement:
The KindleUnpack program requires Python 2.6 or 2.7 to function properly.
Where the README.md has the following as the Python requirement:
The KindleUnpack program requires Python 2.7.X or Python 3.4 or later to function properly.
Some fixed-layout KF8 books don't have the viewport
meta
tag within each page and, while there is viewport data in the file ("original-resolution" in EXTH), KindleUnpack only outputs
<meta name="original-resolution" content="1072x1448" />
It would be better if it could convert it into the rendition:viewport property:
<meta property="rendition:viewport">width=1072, height=1448</meta>
Indeed, it looks like it used to do that, but for it was commented out for some reason:
Lines 675 to 686 in c8be31a
For more context, as well as link to an example file, see johnfactotum/foliate#940
Kindle seems to allow named character references
, but ePub's xhtml is rejected, could you please replace it with a numeric character reference?
To fix this, we recommend that the XHTML file that contains use decimal (hexadecimal) notation instead.
  or  
In other cases it is recommended to use char notation.
× is ×
.
The generated NCX file is broken if title of the book contains &
.
Warning: RESC section length(12384bytes) does not match its size(12385bytes).
Extracting image: image01042.jpeg from section 1042
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Traceback (most recent call last):
File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 968, in <module>
sys.exit(main())
File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 957, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 871, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 794, in process_all_mobi_headers
processMobi7(mh, metadata, sect, files, imgnames)
File "/home/wjzhou/ws/kindle4rss/kindleunpack/kindleunpack.py", line 572, in processMobi7
ncx_data = ncx.parseNCX()
File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_ncx.py", line 36, in parseNCX
outtbl, ctoc_text = self.mi.getIndexData(self.ncxidx, "NCX")
File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_index.py", line 29, in getIndexData
ctocdict = self.readCTOC(cdata)
File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_index.py", line 120, in readCTOC
pos, ilen = getVariableWidthValue(txtdata, offset)
File "/home/wjzhou/ws/kindle4rss/kindleunpack/mobi_index.py", line 144, in getVariableWidthValue
v = data[offset + consumed]
IndexError: string index out of range
When trying to unpack some 'both' MOBI files generated by calibre 3.24.2, I got a crash when it was in the middle of unpacking the KF8 section.
This might be because of an error in the calibre generated book, but we still shouldn't crash. Here's a log of two runs, once with the book as generated by KindleGen, and once with the book as generated by calibre comnversion (both from the calibre-generated epub).
./kindleunpack.py /Users/pdurrant/Desktop/Flying\ Colours\ -\ kindlegen\ both.mobi
KindleUnpack v0.81
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum [email protected]
Extensive Extensions and Improvements Copyright © 2009-2018
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 272 sections.
Unpacking a Combination M8/KF8 book...
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Flying Colours
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: cover00120.jpeg from section 120
Extracting image: image00122.jpeg from section 122
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Flying Colours
Huffdic compression
Unpacking images, resources, fonts, etc
Unpacking raw markup language
Processing ncx / toc
Building an epub-like structure
Building proper xhtml for each file
Building a cover page.
Warning: cover_page.xhtml already exists.
Building an opf for mobi8 using epub version: 2
Write K8 ncx
Creating an epub-like file
Completed
./kindleunpack.py /Users/pdurrant/Desktop/Flying\ Colours\ -\ calibre\ both.mobi
KindleUnpack v0.81
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum [email protected]
Extensive Extensions and Improvements Copyright © 2009-2018
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 268 sections.
Unpacking a Combination M8/KF8 book...
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Flying Colours
EXTH Title: Flying Colours
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: cover00120.jpeg from section 120
Extracting image: image00121.jpeg from section 121
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Flying Colours
EXTH Title: Flying Colours
Palmdoc compression
Unpacking images, resources, fonts, etc
Unpacking raw markup language
Traceback (most recent call last):
File "./kindleunpack.py", line 1019, in
sys.exit(main())
File "./kindleunpack.py", line 1007, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "./kindleunpack.py", line 922, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "./kindleunpack.py", line 839, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "./kindleunpack.py", line 478, in processMobi8
k8proc.buildParts(rawML)
File "/Users/pdurrant/Applications/KindleUnpack 64 v0.81.app/Contents/Resources/mobi_k8proc.py", line 212, in buildParts
self.partinfo.append([skelnum, 'Text', filename, skelpos, baseptr, aidtext])
UnboundLocalError: local variable 'filename' referenced before assignment
Paul:Resources pdurrant$
As far as I can tell, this is a general problem, not specific to this book.
So I have a azw3 comic book file, size 31MB.
After unpack, the size of output folder size is 91MB, almost all images.
How did this happen?
Is there a way to pack it back? Thanks
It seems that most of the region codes are wrong. E.g.
KindleUnpack/lib/mobi_utils.py
Line 36 in c8be31a
I think the keys should be multiples of 4? That is, 4
, 8
, 12
, and 16
for zh-TW
, zh-CN
, zh-HK
, zh-SG
, respectively, not 1
, 2
, 3
, 4
.
The ones for Spanish seem to be the only ones that are correct:
KindleUnpack/lib/mobi_utils.py
Lines 97 to 99 in c8be31a
Hello,
I am having issues unpacking an .azw4 ebook.
I know the ebook was deDRM'd as the Calibre debug log stated it was successful. I am unable to unpack in Calibre, receiving almost the same memory message as the other user:
calibre, version 2.65.1
ERROR: KindleUnpack - The Plugin v0.81.2:
Traceback (most recent call last):
File "calibre_plugins.kindleunpack_plugin.action", line 267, in unpack_ebook
File "calibre_plugins.kindleunpack_plugin.mobi_stuff", line 123, in unpackMOBI
File "calibre_plugins.kindleunpack_plugin.kindleunpack.kindleunpack", line 869, in unpackBook
File "calibre_plugins.kindleunpack_plugin.kindleunpack.mobi_sectioner", line 50, in init
MemoryError
Following your advice on helping the other user, I downloaded Python 64 and the standalone Unpacker, but get this message:
Conversion Log
Input Path = "C:\Users\416\Documents\Calibre Library\Peter Raven\Biology, 10E, With Access Code For C (15)\Biology, 10E, With Access Code - Peter Raven.azw4"
Output Path = "C:\Users\416\Desktop\Calibre Decrypted\New folder"
WriteRawML = True
Epub Output Type Set To: Auto-detect
Use HD Images If Present = True
Please Wait ...
Error: Unpacking Failed
Here is the debug log from Calibre:
calibre Debug log
calibre 2.65.1 embedded-python: True is64bit: False
Windows-8-6.2.9200 Windows ('32bit', 'WindowsPE')
32bit process running on 64bit windows
('Windows', '8', '6.2.9200')
Python 2.7.9
Windows: ('8', '6.2.9200', '', 'Multiprocessor Free')
Successfully initialized third party plugins: DeDRM (6, 5, 1) && KindleUnpack - The Plugin (0, 81, 2)
Starting up...
Started up in 7.36 seconds with 3 books
stdout+stderr from file dialog helper: ['', '']
piped data from file dialog helper: ['\x02A\xce\x19\xa5\x8a\xa7^\x9b\x8d\x9f>tp\x14s\xb6a\xbb\x9e\xb5n\xa1Cz\xcb\xc8\xc3>\x8de\x8b', 'C:\Users\416\Documents\My Kindle Content\B00WN44PZM_EBOK.azw4']
DeDRM v6.5.1: Trying to decrypt B00WN44PZM_EBOK.azw4
Using Library AlfCrypto DLL/DYLIB/SO
MobiDeDrm v0.41.
Copyright © 2008-2012 The Dark Reverser et al.
MOBI header version 4, header length 248
Decrypting Mobipocket 4 ebook: Biology, 10E, With Access Code For Connect Plus
Found 4 keys to try after 1.0 seconds
Crypto Type is: 2
File is encoded with PID xCy04K2pVR.
Decrypting. Please wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done
Decryption succeeded after 16.9 seconds
DeDRM v6.5.1: Finished after 17.6 seconds
Added Biology, 10E, With Access Code For Connect Plus to db in: 0.5
Added 1 books in 18.7 seconds
Any insight you could offer would be greatly appreciated.
Thank you,
nix416
Hello!
I was trying to convert a .mobi dictionary file intended for an Amazon Kindle, in order to make a dictionary file for my Kobo Nia. I tried to use this utility but was receiving the following error:
Conversion Log
Input Path = "D:\Programs\Dict\bulgarian_dictionary.mobi"
Output Path = "D:\Programs\Dict"
Epub Output Type Set To: Auto-detect
Please Wait ...
Palm DB type: BOOKMOBI, 4064 sections.
Unpacking a Mobipocket 7 book...
Processing Mobipocket 7 section of book...
Mobi Version: 7
Codec: utf-8
Title: Bulgarian Dictionary
Palmdoc compression
Unpacking images, resources, fonts, etc
File contains kindlegen source archive, extracting as kindlegensrc.zip
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Info: Document contains orthographic index, handle as dictionary
Parsing metaInflIndexData
Parsing metaOrthIndex
orthIndexCount is 17
orth entry uses ordt2 lookup table of type 0
Read dictionary index data
Error: 'array.array' object has no attribute 'tostring'
Traceback (most recent call last):
File "D:\Programs\Dict\KindleUnpack-083\KindleUnpack.pyw", line 370, in unpackEbook
kindleunpack.unpackBook(infile, outdir, apnxfile, epubversion, use_hd, dodump=dump, dowriteraw=writeraw, dosplitcombos=splitcombos)
File "D:\Programs\Dict\KindleUnpack-083\lib\kindleunpack.py", line 932, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "D:\Programs\Dict\KindleUnpack-083\lib\kindleunpack.py", line 853, in process_all_mobi_headers
processMobi7(mh, metadata, sect, files, rscnames)
File "D:\Programs\Dict\KindleUnpack-083\lib\kindleunpack.py", line 630, in processMobi7
positionMap = dictSupport(mh, sect).getPositionMap()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Programs\Dict\KindleUnpack-083\lib\mobi_dict.py", line 205, in getPositionMap
inflectionGroups = self.getInflectionGroups(text, inflectionControlByteCount, inflectionTagTable,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Programs\Dict\KindleUnpack-083\lib\mobi_dict.py", line 288, in getInflectionGroups
inflection = self.applyInflectionRule(mainEntry, data, offset+1, offset+1+textLength)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Programs\Dict\KindleUnpack-083\lib\mobi_dict.py", line 377, in applyInflectionRule
return utf8_str(byteArray.tostring())
^^^^^^^^^^^^^^^^^^
AttributeError: 'array.array' object has no attribute 'tostring'
I traced the error down to the mobi_dict.py file function, and replaced the .tostring() call with a .tobytes() and the conversion ran successfully.
I'm not a very gifted programmer, so I don't understand the full repercussions of what I did, but maybe you can judge if it would be something you'd like to implement.
mobi dictionary used:
Running Python 3.11.1
Thanks for the hard work!
Hi,
I unpacked Oxford Dictionary of English kindle dictionary with KindleUnpack(latest version), found there was not any Inflection data in idx:infl for the word "registered".
However, when I looked up "registered" with this dictionary in my kindle, the word "register" was matched.
There are more information for your reference: https://www.mobileread.com/forums/showthread.php?t=346226
Once a .mobi has been unpacked and edited, is it possible to pack it up again into .mobi format?
Thanks
There is a small typo in lib/mobi_opf.py.
Should read specific rather than specifc.
Not sure if you're even interested in coming back to this... but just for the record, I'm getting an AttributeError thrown when it tries to parse my NCX file for my fixed-layout book. I have a pageList in there acting as a sub-index for images/illustrations (which I got out of this part of the specification; Kindle Previewer 3 recognizes it). It looks something like this:
<!-- on same level as navMap toc -->
<navList>
<navLabel><text>Images</text></navLabel>
<navTarget id="i1">
<navLabel><text>image caption</text></navLabel>
<content src="pg1.html" />
</navTarget>
<navTarget id="i2">
<navLabel><text>image caption</text></navLabel>
<content src="pg3.html" />
</navTarget>
<navTarget id="i3">
<navLabel><text>image caption</text></navLabel>
<content src="pg7.html" />
</navTarget>
<!-- so on and so forth -->
</navList>
If this section is in the NCX file, the script throws the following error:
Traceback (most recent call last):
File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module>
sys.exit(main())
File "KindleUnpack/lib/kindleunpack.py", line 1008, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8
[junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':')
AttributeError: 'NoneType' object has no attribute 'split'
Here's how I'm running the script:
python KindleUnpack/lib/kindleunpack.py -s fixed.mobi tmp
And here's the full verbose console output on the off chance it's of use to you:
KindleUnpack v0.82
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum <[email protected]>
Extensive Extensions and Improvements Copyright © 2009-2014
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 143 sections.
Unpacking a Combination M8/KF8 book...
First Image, last Image 42 72
Processing Mobipocket 6 section of book...
Mobi Version: 6
Codec: utf-8
Title: Malsagulo's Book
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting image: image00042.jpeg from section 42
Extracting image: image00043.jpeg from section 43
Extracting image: image00044.jpeg from section 44
Extracting image: image00045.jpeg from section 45
Extracting image: image00046.jpeg from section 46
Extracting image: image00047.jpeg from section 47
Extracting image: image00048.jpeg from section 48
Extracting image: image00049.jpeg from section 49
Extracting image: image00050.jpeg from section 50
Extracting image: image00051.jpeg from section 51
Extracting image: image00052.jpeg from section 52
Extracting image: image00053.jpeg from section 53
Extracting image: image00054.jpeg from section 54
Extracting image: image00055.jpeg from section 55
Extracting image: image00056.jpeg from section 56
Extracting image: image00057.jpeg from section 57
Extracting image: image00058.jpeg from section 58
Extracting image: image00059.jpeg from section 59
Extracting image: image00060.jpeg from section 60
Extracting image: image00061.jpeg from section 61
Extracting image: image00062.jpeg from section 62
Extracting image: image00063.jpeg from section 63
Extracting image: image00064.jpeg from section 64
Extracting image: image00065.jpeg from section 65
Extracting image: image00066.jpeg from section 66
Extracting image: image00067.jpeg from section 67
Extracting image: image00068.jpeg from section 68
Extracting image: cover00069.jpeg from section 69
Extracting image: image00071.jpeg from section 71
Extracting Page Map Information
File contains kindlegen source archive, extracting as kindlegensrc.zip
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Building an opf for mobi7/azw4.
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: Malsagulo's Book
Huffdic compression
Unpacking images, resources, fonts, etc
Extracting Page Map Information
Unpacking raw markup language
Processing ncx / toc
Traceback (most recent call last):
File "KindleUnpack/lib/kindleunpack.py", line 1020, in <module>
sys.exit(main())
File "KindleUnpack/lib/kindleunpack.py", line 1008, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "KindleUnpack/lib/kindleunpack.py", line 923, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "KindleUnpack/lib/kindleunpack.py", line 840, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "KindleUnpack/lib/kindleunpack.py", line 530, in processMobi8
[junk1, junk2, junk3, fid, junk4, off] = ncxmap['pos_fid'].split(':')
AttributeError: 'NoneType' object has no attribute 'split'
Otherwise, thanks for the useful tool. It's been a big help -- and still will be, so long as I remember to comment out that part of the NCX. :)
I searched PyPi library, pip search kindle
and pip search mobi
. But no lucky.
Then I found this project. I'm thinking, can this project separate out the kindle ebook formats parser out? So many people can benefit from it.
WDYT?
Error when running under Python 2.7 under windows with command line to get usage/help output:
C:\code\py\KindleUnpack>py -2 lib\kindleunpack.py
KindleUnpack v0.83
Traceback (most recent call last):
File "lib\kindleunpack.py", line 1029, in <module>
sys.exit(main())
File "lib\kindleunpack.py", line 964, in main
print(" Based on initial mobipocket version Copyright -¬ 2009 Charles M. Hannum <[email protected]>")
File "C:\python2764\lib\codecs.py", line 369, in write
data, consumed = self.encode(object, self.errors)
File "C:\python2764\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in position 49: character maps to <undefined>
Luckily not an issue under Python 3 :-)
I can think of two options to resolve:
(c)
as alternativeIf you would prefer a command-line interface, simply look inside KindleUnpack's "lib" folder for the KindleUnpack.py python program and its support modules. You should then be able to run KindleUnpack.py by the following command:
python kindleunpack.py [-r -s -d -h -i] [-p APNX_FILE] INPUT_FILE OUTPUT_FOLDER
NOTE Under Microsoft Windows, first issue:
chcp 1252
Any preferences? I'm happy to post a PR.
so, it doesn't do anything. I installed apprentice alf's latest also.
NOTE: on calibre on my old MacBook Pro, awz4 is unpacked without any problem. But it's now working with new install on my new MacBook Pro. That's why I installed KindleUnpack.
KindleUnpack v0.80
Traceback (most recent call last):
File "kindleunpack.py", line 1020, in <module>
sys.exit(main())
File "kindleunpack.py", line 953, in main
print(" Based on initial mobipocket version Copyright 漏 2009 Charles M. Hannum <[email protected]>")
UnicodeEncodeError: 'gbk' codec can't encode character u'\xa9' in position 49: illegal multibyte sequence
while when I switch to python 3.4, and other error occured.
Traceback (most recent call last):
File "K:/Program/Development/pyDev/online/KindleUnpack/lib/kindleunpack.py", line 183, in <module>
from .unpack_structure import fileNames
File "K:/Program/Development/pyDev/online/KindleUnpack/lib\unpack_structure.py", line 163
nzinfo.external_attr = 0o600 << 16L # make this a normal file
^
SyntaxError: invalid syntax
by the way, how can I convert the whole project to .exe? I've tried py2exe and pyinstaller, but both of them could not convert it correctly...
the following code didn't work for me...
pyinstaller -p lib -F lib\kindleunpack.py --hidden-import compatibility_utils --hidden-import unipath --hidden-import unpack_structure --hidden-import mobi_utils
when execute kindleunpack.exe, error occured....
ImportError: No module named mobi_utils
[9208] Failed to execute script kindleunpack
Hello!
I'm running:
py -3 lib/kindleunpack.py -r "Learning XML_ Creating Self-Describing Data_nodrm.azw" output_dir
but it just fails:
KindleUnpack v0.82
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum <[email protected]>
Extensive Extensions and Improvements Copyright © 2009-2014
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Traceback (most recent call last):
File "lib/kindleunpack.py", line 1020, in <module>
sys.exit(main())
File "lib/kindleunpack.py", line 1008, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "lib/kindleunpack.py", line 873, in unpackBook
raise unpackException('Invalid file format')
__main__.unpackException: Invalid file format
here is this file.
I don't know how it was packed, but i think contents in OEB format. Inside i found mention of *.opf file.
https://github.com/kevinhendricks/KindleUnpack/blob/master/lib/mobi_cover.py#L11
From https://docs.python.org/3/library/imghdr.html
Deprecated since version 3.11, will be removed in version 3.13: The imghdr module is deprecated (see PEP 594 for details and alternatives).
In DumpMobiHeader_v023.py, the ouput is "DumDumpMobiHeader v022" rather than "DumpMobiHeader v023".
Line 629: print("DumpMobiHeader v022")
chcp 65001 > nul
python kindleunpack.py --epub_version=3 E:\Downloads\files.azw3 E:\Downloads\tmp
KindleUnpack v0.80
Based on initial mobipocket version Copyright © 2009 Charles M. Hannum [email protected]
Extensive Extensions and Improvements Copyright © 2009-2014
by: P. Durrant, K. Hendricks, S. Siebert, fandrieu, DiapDealer, nickredding, tkeo.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, version 3.
Unpacking Book...
Palm DB type: BOOKMOBI, 13727 sections.
Warning: Bad key, size, value combination detected in EXTH 406 16 0000000000000000
Unpacking a KF8 book...
Processing K8 section of book...
Mobi Version: 8
Codec: utf-8
Title: **古典文学荟萃(全36册)
EXTH Title: **古典文学荟萃【全36册】(精选先秦以来的优秀文学作品,涵盖诗、词、曲、小说、戏剧、随笔等体裁,权威版本,名家译注,文学爱好者案头必备丛书)
Palmdoc compression
Unpacking images, resources, fonts, etc
Extracting image: image11510.jpeg from section 11510
Extracting image: image11511.jpeg from section 11511
Extracting image: image11512.jpeg from section 11512
......
Extracting image: image13504.gif from section 13504
Extracting image: image13505.gif from section 13505
Extracting image: image13507.jpeg from section 13507
Warning: Section 13509 does not contain a recognised resource
Warning: Section 13510 does not contain a recognised resource
Warning: Section 13511 does not contain a recognised resource
......
Warning: Section 13710 does not contain a recognised resource
Warning: Section 13711 does not contain a recognised resource
Warning: Section 13712 does not contain a recognised resource
Unpacking raw markup language
Processing ncx / toc
Traceback (most recent call last):
File "kindleunpack.py", line 1016, in
sys.exit(main())
File "kindleunpack.py", line 1004, in main
unpackBook(infile, outdir, apnxfile, epubver, use_hd)
File "kindleunpack.py", line 919, in unpackBook
process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
File "kindleunpack.py", line 836, in process_all_mobi_headers
processMobi8(mh, metadata, sect, files, rscnames, pagemapproc, k8resc, obfuscate_data, apnxfile, epubver)
File "kindleunpack.py", line 522, in processMobi8
ncx_data = ncx.parseNCX()
File ".\mobi_ncx.py", line 45, in parseNCX
outtbl, ctoc_text = self.mi.getIndexData(self.ncxidx, "NCX")
File ".\mobi_index.py", line 38, in getIndexData
ctocdict = self.readCTOC(cdata)
File ".\mobi_index.py", line 131, in readCTOC
pos, ilen = getVariableWidthValue(txtdata, offset)
File ".\mobi_index.py", line 157, in getVariableWidthValue
if ord(v) & 0x80:
TypeError: ord() expected a character, but string of length 0 found
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.