Git Product home page Git Product logo

txt2epub's Introduction

yet another txt2epub converter

https://travis-ci.org/mfrasca/txt2epub.svg?branch=master

txt2epub creates an epub file from a bunch of text files.

Usage

  • Install: python setup.py install --user
  • Execute: txt2epub --keep-line-breaks output.epub input.txt

For more options, please see txt2epub -h.

Dependency

  • python.
  • jinja2: for rendering output from epub template.

why another converter?

the reason for wanting a converter? my favourite newspaper comes in pdf format and it is very well readable on a fast and bright and large and heavy screen like for example an iPad, but it really is close to unreadable on a small thing like a e-ink based eReader.

so I started looking at what one would need to produce a epub file and it resulted that it's not all as obvious or simple as for one to want to do this by hand, but also not that difficult if you have a computer.

I found a dead and not-too-readable google-code program with this name, I thought I would give it a try here.

later on I found some more libraries, but I was far enough in the process that I thought I would finish it anyway!

the way it works

  • read all files from the input directory
  • produce the mimetype
    • this is hard coded
  • produce the META_INF/container.xml
    • it points always at content/00_content.opf
  • produce the content/content.opf
    • metadata: if you pass --options, they go here
    • manifest: the files in the directory
    • spine: again the files in the directory
    • guide: empty (I would not know what to put here)
  • convert the documents to valid html
  • zip everything in the correct order

some examples

the script works on all simple input and attempts to do something reasonable with more complex cases. the maintainers of this program will be glad to consider your remarks and include your patches!

and by the way, you very probably always want to use the two options --creator and --title, as they will make your document easier to identify in most ebook readers.

here we have a look at some examples.

just text files

  • put all files in one directory,
  • make sure that each chapter of your work is in its own file,
  • you do not need any fancy formatting,
  • paragraphs are separated by empty lines (you may specify --keep-line-breaks),
  • you are satisfied with the lexicographic ordering of the chapters,
  • execute: txt2epub output.epub *.txt
  • result: your epub will have an index naming the input files

rst files

same as for text files, with some words of warning:

  • not all rst is supported, if the script crashes on your input please open an issue here!
  • files must have extension .rst, sorry,
  • you can have a single rst file, the index is based on its headings.
  • execute: txt2epub output.epub *.rst

txt2epub's People

Contributors

hupili avatar mfrasca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

txt2epub's Issues

enable images

I made some attempts with including png files and it seems quite simple...
but I need remove some hard coded stuff.

can only use basic ASCII in file names

currently, this program composes a book with text or rst files, each of the files becomes a chapter in the book.

problem is: the name of the chapter is assumed equal to the name of the file, and I can't have accented vocals in these names.

it would be nice if either:

  1. we could use a simple table of contents that associates chapter names to file names, an extra option would be used to inform the script about the association file.
  2. we could use national characters in file names.

probably better implement both enhancements.

embedding fonts

according to the wikipedia page, fonts can be included!
will it work on my icarus reader?
and what is the file format?

add cover page

something like this, following the hints at
https://www.safaribooksonline.com/blog/2009/11/20/best-practices-in-epub-cover-images/

but this does not add the necessary metadata. without that, the cover is not shown.
just too lazy to complete it now.

diff --git a/python/scripts/txt2epub b/python/scripts/txt2epub
index 9b70352..aac4c37 100755
--- a/python/scripts/txt2epub
+++ b/python/scripts/txt2epub
@@ -31,6 +31,7 @@ if __name__ == '__main__':
     parser.add_argument('--keep-line-breaks', action='store_true')
     parser.add_argument('--nokeep-line-breaks', action='store_false', dest='keep_line_breaks')
     parser.add_argument('--type')
+    parser.add_argument('--cover-page')
     parser.add_argument('--title')
     parser.add_argument('--author')
     parser.add_argument('--creator')
diff --git a/python/txt2epublib/__init__.py b/python/txt2epublib/__init__.py
index b462f70..bbaa283 100644
--- a/python/txt2epublib/__init__.py
+++ b/python/txt2epublib/__init__.py
@@ -116,6 +116,27 @@ def main(destination, sources, **options):
         template = env.get_template("item.html")
         split_on = '\n\n'
     included = []
+
+    if options.get('cover_page'):
+        options['cover_page_basename'] = os.path.basename(options['cover_page'])
+        out = open(tempdir + "/content/cover.html", "w")
+        out.write("""<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <title>Cover</title>
+    <style type="text/css"> img { max-width: 100%%; } </style>
+  </head>
+  <body>
+    <div id="cover-image">
+        <img src="%(cover_page_basename)s" alt="Title of this Thing"/>
+    </div>
+  </body>
+</html>""" % options)
+        out.close()
+        copyfile(options['cover_page'], tempdir + "/content/" + options['cover_page_basename'])
+        included.append(options['cover_page_basename'])
+        included.append('cover.html')
+
     for item in sources:
         if item['type'] in ["png", "jpg"]:
             copyfile(item['orig'], tempdir + "/content/" + item['full'])

add configuration files for user preferences

in the discussion on issue #13 the need came to light to have a configuration file.
it could be named /etc/txt2epub.conf and ~/.txt2epub.conf
on MSWindows systems this would probably not work and I am not so sure about MacOSX.
in @hupili words: »Daily users like you should have already read the docs so they can configure the defaults they want. Other casual users could get a sensible default that works well on a trial doc.«

keep line breaks

maybe useful to provide an option for keeping line breaks.
currently paragraphs are joined together unless separated by an empty line.
in some cases it is better to keep line breaks.

add some examples

the README could use some cleaning up, and it really needs an "examples" section!!!

Add examples of Rst and markdown

I see in one post that this tool support Rst and markdown. It's good to add some examples of them. Or at least show the ability in README. New users can get it by first glance.

redirected from #13

needed options

two options are really necessary, otherwise the file produced is not valid or is not produced at all.
--date
--identifier

for date, use today
for identifier, use uuid

alert user about missing named arguments

some arguments are optional in the sense that the software will produce output also without them, but you do not want a document to appear as "None" and to have a blank front page...

italics, bold, curly quotes in plain text files?

not so sure about this, but do we want to provide handle italics, and such in plain text files?

argument in favour: imagine I have a longish text file and I want to produce an epub, with a couple of simple edits you can have bold and italics, if we implement some wiki markup.

argument agains, why not simply convert the file to rst? would it be so much more work?

handle FF character

some text documents use the FF (form feed) control character.
it would be nice if txt2epub acknowledged that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.