mfrasca / txt2epub Goto Github PK

View Code? Open in Web Editor NEW

46.0 10.0 11.0 259 KB

create a epub file from a set of text files

Python 93.12% CSS 3.65% HTML 3.23%

txt2epub's Introduction

yet another txt2epub converter

txt2epub creates an epub file from a bunch of text files.

Usage

Install: python setup.py install --user
Execute: txt2epub --keep-line-breaks output.epub input.txt

For more options, please see txt2epub -h.

Dependency

python.
jinja2: for rendering output from epub template.

why another converter?

the reason for wanting a converter? my favourite newspaper comes in pdf format and it is very well readable on a fast and bright and large and heavy screen like for example an iPad, but it really is close to unreadable on a small thing like a e-ink based eReader.

so I started looking at what one would need to produce a epub file and it resulted that it's not all as obvious or simple as for one to want to do this by hand, but also not that difficult if you have a computer.

I found a dead and not-too-readable google-code program with this name, I thought I would give it a try here.

later on I found some more libraries, but I was far enough in the process that I thought I would finish it anyway!

the way it works

read all files from the input directory
produce the mimetype
- this is hard coded
produce the META_INF/container.xml
- it points always at content/00_content.opf
produce the content/content.opf
- metadata: if you pass --options, they go here
- manifest: the files in the directory
- spine: again the files in the directory
- guide: empty (I would not know what to put here)
convert the documents to valid html
zip everything in the correct order

some examples

the script works on all simple input and attempts to do something reasonable with more complex cases. the maintainers of this program will be glad to consider your remarks and include your patches!

and by the way, you very probably always want to use the two options --creator and --title, as they will make your document easier to identify in most ebook readers.

here we have a look at some examples.

just text files

put all files in one directory,
make sure that each chapter of your work is in its own file,
you do not need any fancy formatting,
paragraphs are separated by empty lines (you may specify --keep-line-breaks),
you are satisfied with the lexicographic ordering of the chapters,

execute: txt2epub output.epub *.txt
result: your epub will have an index naming the input files

rst files

same as for text files, with some words of warning:

not all rst is supported, if the script crashes on your input please open an issue here!
files must have extension .rst, sorry,
you can have a single rst file, the index is based on its headings.

execute: txt2epub output.epub *.rst

txt2epub's People

Contributors

Stargazers

Watchers

Forkers

hupili aydinsakar abbypan lirazsiri majro rotios liyufanzz akipham15 kindle15 causemx

txt2epub's Issues

when doing rst files, the table of contents is not populated

when you do rst files (possibly just one), it has a structure with chapters and paragraphs etc.
this structure should be reflected in the toc.ncx file.

the content element has a src attribute that can be a file.html#name target.

create a graphical interface

what about a small graphical interface that invokes the main function?

enable images

I made some attempts with including png files and it seems quite simple...
but I need remove some hard coded stuff.

can only use basic ASCII in file names

currently, this program composes a book with text or rst files, each of the files becomes a chapter in the book.

problem is: the name of the chapter is assumed equal to the name of the file, and I can't have accented vocals in these names.

it would be nice if either:

we could use a simple table of contents that associates chapter names to file names, an extra option would be used to inform the script about the association file.
we could use national characters in file names.

probably better implement both enhancements.

handle .rst files

should not be difficult to handle also .rst files.

embedding fonts

according to the wikipedia page, fonts can be included!
will it work on my icarus reader?
and what is the file format?

use docopt to enhance arg parse and use pattern presentation

use docopt. We can simply write typical use patterns in the code. This gives more useful information for first timers.

redirected from #13

add cover page

something like this, following the hints at
https://www.safaribooksonline.com/blog/2009/11/20/best-practices-in-epub-cover-images/

but this does not add the necessary metadata. without that, the cover is not shown.
just too lazy to complete it now.

diff --git a/python/scripts/txt2epub b/python/scripts/txt2epub
index 9b70352..aac4c37 100755
--- a/python/scripts/txt2epub
+++ b/python/scripts/txt2epub
@@ -31,6 +31,7 @@ if __name__ == '__main__':
     parser.add_argument('--keep-line-breaks', action='store_true')
     parser.add_argument('--nokeep-line-breaks', action='store_false', dest='keep_line_breaks')
     parser.add_argument('--type')
+    parser.add_argument('--cover-page')
     parser.add_argument('--title')
     parser.add_argument('--author')
     parser.add_argument('--creator')
diff --git a/python/txt2epublib/__init__.py b/python/txt2epublib/__init__.py
index b462f70..bbaa283 100644
--- a/python/txt2epublib/__init__.py
+++ b/python/txt2epublib/__init__.py
@@ -116,6 +116,27 @@ def main(destination, sources, **options):
         template = env.get_template("item.html")
         split_on = '\n\n'
     included = []
+
+    if options.get('cover_page'):
+        options['cover_page_basename'] = os.path.basename(options['cover_page'])
+        out = open(tempdir + "/content/cover.html", "w")
+        out.write("""<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <title>Cover</title>
+    <style type="text/css"> img { max-width: 100%%; } </style>
+  </head>
+  <body>
+    <div id="cover-image">
+        <img src="%(cover_page_basename)s" alt="Title of this Thing"/>
+    </div>
+  </body>
+</html>""" % options)
+        out.close()
+        copyfile(options['cover_page'], tempdir + "/content/" + options['cover_page_basename'])
+        included.append(options['cover_page_basename'])
+        included.append('cover.html')
+
     for item in sources:
         if item['type'] in ["png", "jpg"]:
             copyfile(item['orig'], tempdir + "/content/" + item['full'])

offer to join forces with similar project

I've opened an issue on a project that has the same name as this one, inviting the author to have a look here and see if we want to coordinate efforts.

there's no installation procedure

an installation script or something is needed.

add configuration files for user preferences

in the discussion on issue #13 the need came to light to have a configuration file.
it could be named /etc/txt2epub.conf and ~/.txt2epub.conf
on MSWindows systems this would probably not work and I am not so sure about MacOSX.
in @hupili words: »Daily users like you should have already read the docs so they can configure the defaults they want. Other casual users could get a sensible default that works well on a trial doc.«

argument agains, why not simply convert the file to rst? would it be so much more work?

handle FF character

some text documents use the FF (form feed) control character.
it would be nice if txt2epub acknowledged that.