Git Product home page Git Product logo

booky's Introduction

booky

This script creates bookmarks of a pdf from a simple text file. The tool pdftk can already do this in fact internally I am using that tool itself. But pdftk requires a format which is too tedious to write. So I have written this script to enter bookmarks data in a simple format.

Dependencies

  • bash
  • python3
  • pdftk
  • dirname
  • basename
  • GNU sed (OSX users take note, you may have BSD sed. Install gsed instead)

Bookmark format

  • Every level starts with a { on a separate line.

  • Bookmarks have title with page number separated by comma.

  • Both title and page number should be on the same line.

  • All these are equivalent (i.e. the script is whitespace agnostic).

    title1, 1
    title1,             1
          title1     ,         1
    

    Example

    {
      Title1, 1
      Title2, 2
      {
        Subtitle1, 3
        Subtitle2, 4
        {
          SubSubtitle1, 5
          ...
        }
      }
    }
    

How To Use it?

  • First clone this repository and change your directory. Execute this in a terminal

    git clone https://github.com/SiddharthPant/booky.git
    cd booky
    
  • Now copy your pdf file to this directory

  • Create a new text file and write your bookmarks in the given format

  • Now your directory should contain 4 files: booky.sh, booky.py, your_pdf_file.pdf, your_text_file.txt

  • Write the following commands in the terminal

    ./booky.sh your_pdf_file.pdf your_text_file.txt
    

If you add the booky directory to the environment PATH like:

export PATH=/path_to_the_booky:$PATH

then it can run from any directory:

  booky.sh your_pdf_file.pdf your_text_file.txt

This creats a new pdf file your_pdf_file_new.pdf with your bookmarks.

This is going to work in POSIX systems, but if instead you are on a Windows machine. Then first install python3 and pdftk just use the booky.py file in the repo to convert bkmrks.txt to pdftk compatible format

python3 booky.py < bkmrks.txt > output.txt

use the export command to generate a dumped data file.

pdftk C:\Users\Sid\Desktop\doc.pdf dump_data output C:\Users\Sid\Desktop\doc_data.txt

Remove the previous bookmarks from that file and insert content of output.txt instead using a simple copy & paste. And then import that data back.

pdftk C:\Users\Sid\Desktop\doc.pdf update_info C:\Users\Sid\Desktop\doc_data.txt output C:\Users\Sid\Desktop\updated.pdf

If this does not update your bookmarks check that your pdftk version is greater than 1.45

booky's People

Contributors

jez avatar kant avatar neariot avatar oseenix avatar siddharthpant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

booky's Issues

Combine to single file for linux.

A much better option is to combine the python file into the shell script so we can put that one file in PATH,

#!/bin/bash

# Change to the directory of pdf file
cd $(dirname "$1")
pdf=$(basename "$1")
pdf_data="${pdf%.*}""_data.txt"
EXTRACT_FILE=booky_bookmarks_extract
bkFile="$2"


if [[ "$OSTYPE" == "darwin"* ]]; then
    SED=gsed
else
    SED=sed
fi

echo "Converting $bkFile to pdftk compatible format"
python3 -c '
import sys

level = 0
startChar = "{"
endChar = "}"
for line in sys.stdin:
	line = line.strip()
	if line == startChar:
		level = level + 1
	elif line == endChar:
		level = level - 1
	elif line:
		commaIndex = line.rfind(",")
		title = line[:commaIndex]
		pageNo = line[commaIndex + 1:].strip()
		print("BookmarkBegin")
		print("BookmarkTitle:", title.strip())
		print("BookmarkLevel:", level)
		print("BookmarkPageNumber:", pageNo.strip())' < "$bkFile" > "$EXTRACT_FILE"

echo "Dumping pdf meta data..."
pdftk "$pdf" dump_data_utf8 output "$pdf_data"

echo "Clear dumped data of any previous bookmarks"
$SED -i '/Bookmark/d' "$pdf_data"

echo "Inserting your bookmarks in the data"
$SED -i "/NumberOfPages/r $EXTRACT_FILE" "$pdf_data"

echo "Creating new pdf with your bookmarks..."
pdftk "$pdf" update_info_utf8 "$pdf_data" output "${pdf%.*}""_new.pdf"

echo "Deleting leftovers"
rm "$EXTRACT_FILE" "$pdf_data"

keeps failing

Hi, I hope you can help me work out what I am doing wrong.
10.15.6
PDF version: pdftk_server-2.02-mac_osx-10.11-setup

book called book.pdf
text file containing TOC is TOC.txt
in terminal executed

In Terminal it says:

(base) XXX@XXXs-MacBook-Air booky % ./booky.sh book.pdf TOC.txt
Converting TOC.txt to pdftk compatible format
Dumping pdf meta data...
Clear dumped data of any previous bookmarks
sed: 1: "book_data.txt": undefined label 'ook_data.txt'
Inserting your bookmarks in the data
sed: 1: "book_data.txt": undefined label 'ook_data.txt'
Creating new pdf with your bookmarks...
Deleting leftovers

In the pdf file, no new bookmarks were created.
I checked I have { } around the bookmark
I checked I have", space" between bookmark and page number

Does it handle - in a bookmark name?

suggest to add the offset

I think your solution is great to create the bookmark automatically. I have observed that many PDFs have page number in the Content section but they do not agree with the corrected PDF page. Therefore one has to calculate manually the page number from the content. This process is error-prone. I have a suggestion to add offset to the page marking, for example:

{
Title1, 1
Title2, 2
offset, 5
{
Subtitle1, 3
Subtitle2, 4
{
SubSubtitle1, 5
...
}
}
}

Then from when the offset keyword is defined, the page number is automatically added up. By this solution, one only needs to copy the page number from the content.

quick way/tips to prepare TOC text file into booky format?

Hello: Do you have any regex suggestions or tips to quickly prepare the TOC text file into booky required format?
I am very new to regex, so any help would be super.

I was trying to find a tool, where I could create a template for one chapter of the TOC and then apply this format template to all other chapters. Kinda like excel's "paste special" feature.

For example:
1 Insert { and beginning of each TOC block, and } at end of each TOC block
2 replace TOCitem leading dots (.......67) with booky required format of /67
e.g. TOCitem ........67 ==> TOCitem/67
3 replace TOCitem (space space67) or (space space,67) or (space space space67) with booky required format of /67
4 Automate indentation of all child TOC items

Or maybe there is a repository of regex samples that apply to TOC manipulation.
I use sublime texteditor and could not find any specific snippets for TOC text file manipulation

Thankyou

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.