Some years ago I read a quote by Abraham Lincoln about How a government must be. He said
Government of the people, by the people, for the people, shall not perish from the Earth.
Actually I first read it years ago in University when Studing about Unix in a book. But unfortunaltly I can't remember book name.
But fortunate enouth someone in an article write about in Persian here.
So in there express source of quote is Book ABC of SCO Unix.
After I find book name, the next step is to get book.
So how to find ebook version.
General search engine like duckduckgo not provide result.
Consulting libgen.is
without success.
Next try is Intenet archive project
OK, find it in internet archive. But in there we can only review and borrow.
By Installing firefox extention Internet Archive Downloader
we can grab pdf file.
$ ls -ltrh
-rw-r--r-- 1 esmaeel esmaeel 175M Feb 28 01:05 abc.pdf
To doing search we need convert each book page to plain text
In Debian we have tesseract
program for this task.
Install tesseract
$ sudo apt install tesseract
Convert picture to text
tesseract eng.png out.txt
It is simple, just run it with bash
bash run.sh
After works finished we have a good book to read.
Search for word in all files with grep
I want to search for "lincoln" word in all of book.
grep -ri "lincoln" out/text -A20 -B 20 -h > ~/lincoln.txt
Explaination
grep for lincoln
word in all files inside the out/text
directory. doing search in case intensive form i
and recursivly r
.
Show only 20 lines before and after -A20 -B20
And remove file name from output -h
Lincoln said government should be “of the people, by the peo- ple, for the people.” Thompson and Ritchie designed UNIX to be “of the files, by the files, for the files.”
use grep -l
$ grep -ri "lincoln" . -l
./087.txt
This quote appeard in this file 087.txt
After all Initial question is answered well.
And I want to share it with my friends but pdf file is very huge.
$ ls -ltrh
-rw-r--r-- 1 esmaeel esmaeel 175M Feb 28 01:05 abc.pdf
Extract all images from pdf
mkdir images
pdfimages -all abc.pdf images/
By reducing each part of pdf, total size will reduce.
Using convert
command and set recution factore to 40%
With try of fail method I find 30% factor is good enough for this ebook.
Append images, create new pdf book.
convert out/*.jpg abc_of_unix.pdf