siddharthpant / booky Goto Github PK

A simple script for pdf bookmarks creation

Python 34.50% Shell 65.50%

booky's Introduction

booky

This script creates bookmarks of a pdf from a simple text file. The tool pdftk can already do this in fact internally I am using that tool itself. But pdftk requires a format which is too tedious to write. So I have written this script to enter bookmarks data in a simple format.

Dependencies

bash
python3
pdftk
dirname
basename
GNU sed (OSX users take note, you may have BSD sed. Install gsed instead)

Bookmark format

Every level starts with a { on a separate line.
Bookmarks have title with page number separated by comma.
Both title and page number should be on the same line.

All these are equivalent (i.e. the script is whitespace agnostic).

title1, 1
title1,             1
      title1     ,         1

Example

{
  Title1, 1
  Title2, 2
  {
    Subtitle1, 3
    Subtitle2, 4
    {
      SubSubtitle1, 5
      ...
    }
  }
}

How To Use it?

First clone this repository and change your directory. Execute this in a terminal
```
git clone https://github.com/SiddharthPant/booky.git
cd booky
```
Now copy your pdf file to this directory
Create a new text file and write your bookmarks in the given format
Now your directory should contain 4 files: booky.sh, booky.py, your_pdf_file.pdf, your_text_file.txt

Write the following commands in the terminal

./booky.sh your_pdf_file.pdf your_text_file.txt

If you add the booky directory to the environment PATH like:

export PATH=/path_to_the_booky:$PATH

then it can run from any directory:

  booky.sh your_pdf_file.pdf your_text_file.txt

This creats a new pdf file your_pdf_file_new.pdf with your bookmarks.

This is going to work in POSIX systems, but if instead you are on a Windows machine. Then first install python3 and pdftk just use the booky.py file in the repo to convert bkmrks.txt to pdftk compatible format

python3 booky.py < bkmrks.txt > output.txt

use the export command to generate a dumped data file.

pdftk C:\Users\Sid\Desktop\doc.pdf dump_data output C:\Users\Sid\Desktop\doc_data.txt

Remove the previous bookmarks from that file and insert content of output.txt instead using a simple copy & paste. And then import that data back.

pdftk C:\Users\Sid\Desktop\doc.pdf update_info C:\Users\Sid\Desktop\doc_data.txt output C:\Users\Sid\Desktop\updated.pdf

If this does not update your bookmarks check that your pdftk version is greater than 1.45

booky's People

Contributors

Stargazers

Watchers

Forkers

mrqianjinsi neariot vimkim jez kant djndl1 oseenix valrcs stephenmjm bangnguyendev sarming nealseah fizzym mafsi afirooz niyaz-ahmad xiaolaba geckoblu-forks adityasz adolfgatonegro

booky's Issues

Combine to single file for linux.

A much better option is to combine the python file into the shell script so we can put that one file in PATH,

#!/bin/bash

# Change to the directory of pdf file
cd $(dirname "$1")
pdf=$(basename "$1")
pdf_data="${pdf%.*}""_data.txt"
EXTRACT_FILE=booky_bookmarks_extract
bkFile="$2"


if [[ "$OSTYPE" == "darwin"* ]]; then
    SED=gsed
else
    SED=sed
fi

echo "Converting $bkFile to pdftk compatible format"
python3 -c '
import sys

level = 0
startChar = "{"
endChar = "}"
for line in sys.stdin:
	line = line.strip()
	if line == startChar:
		level = level + 1
	elif line == endChar:
		level = level - 1
	elif line:
		commaIndex = line.rfind(",")
		title = line[:commaIndex]
		pageNo = line[commaIndex + 1:].strip()
		print("BookmarkBegin")
		print("BookmarkTitle:", title.strip())
		print("BookmarkLevel:", level)
		print("BookmarkPageNumber:", pageNo.strip())' < "$bkFile" > "$EXTRACT_FILE"

echo "Dumping pdf meta data..."
pdftk "$pdf" dump_data_utf8 output "$pdf_data"

echo "Clear dumped data of any previous bookmarks"
$SED -i '/Bookmark/d' "$pdf_data"

echo "Inserting your bookmarks in the data"
$SED -i "/NumberOfPages/r $EXTRACT_FILE" "$pdf_data"

echo "Creating new pdf with your bookmarks..."
pdftk "$pdf" update_info_utf8 "$pdf_data" output "${pdf%.*}""_new.pdf"

echo "Deleting leftovers"
rm "$EXTRACT_FILE" "$pdf_data"

keeps failing

Hi, I hope you can help me work out what I am doing wrong.
10.15.6
PDF version: pdftk_server-2.02-mac_osx-10.11-setup

book called book.pdf
text file containing TOC is TOC.txt
in terminal executed

In Terminal it says:

(base) XXX@XXXs-MacBook-Air booky % ./booky.sh book.pdf TOC.txt
Converting TOC.txt to pdftk compatible format
Dumping pdf meta data...
Clear dumped data of any previous bookmarks
sed: 1: "book_data.txt": undefined label 'ook_data.txt'
Inserting your bookmarks in the data
sed: 1: "book_data.txt": undefined label 'ook_data.txt'
Creating new pdf with your bookmarks...
Deleting leftovers

In the pdf file, no new bookmarks were created.
I checked I have { } around the bookmark
I checked I have", space" between bookmark and page number

Does it handle - in a bookmark name?

suggest to add the offset

I think your solution is great to create the bookmark automatically. I have observed that many PDFs have page number in the Content section but they do not agree with the corrected PDF page. Therefore one has to calculate manually the page number from the content. This process is error-prone. I have a suggestion to add offset to the page marking, for example:

{
Title1, 1
Title2, 2
offset, 5
{
Subtitle1, 3
Subtitle2, 4
{
SubSubtitle1, 5
...
}
}
}

Then from when the offset keyword is defined, the page number is automatically added up. By this solution, one only needs to copy the page number from the content.

quick way/tips to prepare TOC text file into booky format?

Hello: Do you have any regex suggestions or tips to quickly prepare the TOC text file into booky required format?
I am very new to regex, so any help would be super.

I was trying to find a tool, where I could create a template for one chapter of the TOC and then apply this format template to all other chapters. Kinda like excel's "paste special" feature.

For example:
1 Insert { and beginning of each TOC block, and } at end of each TOC block
2 replace TOCitem leading dots (.......67) with booky required format of /67
e.g. TOCitem ........67 ==> TOCitem/67
3 replace TOCitem (space space67) or (space space,67) or (space space space67) with booky required format of /67
4 Automate indentation of all child TOC items

Or maybe there is a repository of regex samples that apply to TOC manipulation.
I use sublime texteditor and could not find any specific snippets for TOC text file manipulation

Thankyou

siddharthpant / booky Goto Github PK

booky's Introduction

booky

Dependencies

Bookmark format

Example

How To Use it?

booky's People

Contributors

Stargazers

Watchers

Forkers

booky's Issues

Combine to single file for linux.

keeps failing

suggest to add the offset

quick way/tips to prepare TOC text file into booky format?

Feature request: add function to reformat exported pdftk bookmarks to booky format

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent