Git Product home page Git Product logo

epub-full-text-search's Introduction

EPUB-Search Build Status NPM version

Search engine for digital publication based on EPUB 3

Welcome! EPUB-Search makes your digital publications searchable.

What is the use case:

  • Server-side microservice to search for browser-based “cloud” readers within EPUBs
  • For EPUBs that lives online
  • To search within your local EPUB-stock

Online Demo

Demo

Features included:

  • Full text search (get all query matches for one epub-document or for a whole epub collection)
  • Autocomplete
  • Full javascript
  • Hits including cfi references
  • Response results in JSON format
  • Pre-indexing
  • Indexing on-the-fly

Installation

For CLI use

[sudo] npm install epub-full-text-search -g

For library use

$ npm install epub-full-text-search --save

Running as a Service

CLI

$ epub-search 

Welcome to Epub search service

Usage: epub-search [action] [options]

Actions:
        start           Start the service
        stop            Stop the service
        logs            Show logs
        writeToIndex    Epub-book(s) which should be written to index.(Hint: the epub content have to be unzipped)

Options:
        -p      Path to epub folder which contains epub-book(s).

Start Service

$ [sudo] epub-search start

Modus operandi

EPUB search provides two modus operandi:

  • The first one is Indexing On-the-fly. This means the ebook will be indexed in the background when it gets opened. The assumption for this mode is the EPUB3-book which is remote available. The generated search-index will be deleted if the ebook is closed.

  • The second one is Pre-Indexing. This means all ebooks on the local machine can be indexed and the generated search index will be persistent available during all reading sessions. So it possible the search terms within all indexed ebooks.

Indexing On-the-fly

Indexing
http://localhost:8085/addToIndex?url=${epub}/&uuid=${uuid}
Search
http://localhost:8085/search?q=${term}&uuid=${uuid}
Delete index
http://localhost:8085/deleteFromIndex?&uuid=${uuid}

Pre-Indexing

Indexing

Let´s start to index some EPUBs:

$ epub-search writeToIndex -p  <path>
Search

Search for term:

http://localhost:8085/search?q=${term}&t=${EPUB-title};
Suggestions for Autocomplete
$  http://localhost:8085/matcher?beginsWith=beginning-of-the-text-to-match

Examples:

Indexing On-the-fly

TODO

Pre-indexing

At first, please install epub-search globally:

[sudo] npm install epub-full-text-search -g

Start service:

$ [sudo] epub-search start

Add sample epubs to index:

epub-search writeToIndex -p {prefix}/node_modules/epub-full-text-search/node_modules/epub3-samples

Now we should get some hits for the term epub:

For requests you can use $ curl -XGET "http://localhost:8085/search?q=math" or the browser...

Search within the whole ebook-collection:

http://localhost:8085/search?q=math

Set the filter for the book-title t="..." to search only within a specific ebook:

http://localhost:8085/search?q=epub&t=Accessible+EPUB+3

Or we can get some suggestions for an autocomplete:

http://localhost:8085/matcher?beginsWith=epu

For library use

TODO

Hit data format

TODO

Local testing

Install all dependent modules: npm install.

Start up the demo npm run start. It should run an express server on your local machine.

When you are navigating to http://localhost:8085/ you can see the demo?.

Note: The pre-indexing process starts automatically and it takes a few seconds until the pre-indexing search is available.

Technical Details

EPUB-Search uses search-index to indexing book content.

Contributing

Very welcome ... :-)

epub-full-text-search's People

Contributors

andrewlinfoot avatar larsvoigt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epub-full-text-search's Issues

Add feature

  • api function to index an epub folder/file and add it to our existing index instance

Installation problem

Hello, I am quite new at these stuff. I am facing some problems while installing the project. I tried installing the project by the command 'npm install epub-full-text-search --save'. After many tiresome attempts, the command is finally running without any error but some warnings.
like :
C:\Users\11050\Documents>npm install epub-full-text-search --save
npm WARN saveError ENOENT: no such file or directory, open 'C:\Users\11050\package.json'
npm WARN enoent ENOENT: no such file or directory, open 'C:\Users\11050\package.json'
npm WARN 11050 No description
npm WARN 11050 No repository field.
npm WARN 11050 No README data
npm WARN 11050 No license field.
npm WARN 11050 Invalid dependency: jasmine-core undefined
npm WARN 11050 Invalid dependency: jquery undefined
npm WARN 11050 Invalid dependency: jquery-easing undefined
I thought a folder containing the code would appear in my Documents directory but there was no such thing. Then I download the project and and from my git bash, ran 'npm install' to install the dependencies. It showed message like 'finished generating code' (with two warnings).

npm WARN [email protected] requires a peer of webpack@1 || 2 || ^2.1.0-beta || ^2.2.0-rc but none was installed.
npm WARN The package brfs is included as both a dev and production dependency.

Then I tried to run epub-search start and it could not recognize 'epub-search' as a command. So I am back to square one. Where do I proceed from this ? Is my way of installation even correct ?

speed up autostart

if it is necessary to rebuild the index,
because e.g. new epub-content is added

Get excerpt surround search term

I am working on integrating this into an app and I want to display an excerpt of the content surrounding the search term. How would I implement something like this?

AWS lambda and levelDB

Hello,

I want to use this code in AWS lambda function but the problen is that the levelDB doesn't store the db in the folder. it can store the following files:

000003.log size: 8830932
CURRENT size: 16
IndexControllerDB.json size: 118
LOCK size: 0
LOG size: 57
MANIFEST-000002 size: 50

but it can't save .ldb file. is there any way to full print the debugging stack to see what is going wrong.

thanks

integration of full-text-search into readium cloud reader

@larsvoigt
Dear Lars,
I have tried to integrate full-text-search in my Readium cloud reader, by flowing the instructions you shared in readium/readium-js#17:
"epub-full-text-search usage
If you want see how this feature can be implemented then check out branch.
To get it run call:
Prerequisites are same like the orginal readium repro
npm run build
npm run dist:cloudReaderWithFullTextSearch"
But I cannot integrate it in my Cloud reader that locally works on my computer. When I use "npm run dist:cloudReaderWithFullTextSearch", my build-output folder is empty. Also the dist folder does not consist of all required folders and files.
Could you please let me know if I need to take into account specific points.
Thanks in advance,
Hajar

How can I incorporate epub search engine in my readium epub reader project ?

@larsvoigt I wished to use epub-search-engine in my readium project. I installed the project in my node_modules directory.
For library use , the first command was , import epubsearch from epub-full-text-search.
However, npm does not seem to recognize import and generated an error. I used babel transpiler and the import command changed to -
"use strict";

var _epubFullTextSearch = require("epub-full-text-search");

var _epubFullTextSearch2 = _interopRequireDefault(_epubFullTextSearch);

function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { default: obj }; }

However, Readium also does not recognize require().

Is ther any way around this problem ? Any suggestion will be greatly appreciated.

epub:type="annotation"

For now subelements of elements with

<... epub:type="annotation" />

should be ignored in indexing process.
Because it is not really clear if the epub-reader-client will be show or hide the annotation element.

Re-indexing

  • option force re-indexing
  • api function to start re-indexing

Test cases

  • single cfis
  • single cfis for docs with mathml included
  • phrase search

Missing log file

From @rudra07130713:

The issue is not completely resolved for me . When I run
C:\Users\11050>epub-search start
The following message appears :

exec path: C:\Users\11050\AppData\Roaming\npm\node_modules\epub-full-text-search\dist\bin

Starting EPUB search ...
fs.js:651
return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
^

Error: ENOENT: no such file or directory, open 'C:\Users\11050.epub-full-text-search\out.log'
at Object.fs.openSync (fs.js:651:18)
at Object.forever.startDaemon (C:\Users\11050\AppData\Roaming\npm\node_modules\epub-full-text-search\node_modules\forever\lib\forever.js:460:14)
at Object. (C:\Users\11050\AppData\Roaming\npm\node_modules\epub-full-text-search\dist\bin\epub-search:52:35)
at Module._compile (module.js:569:30)
at Object.Module._extensions..js (module.js:580:10)
at Module.load (module.js:503:32)
at tryModuleLoad (module.js:466:12)
at Function.Module._load (module.js:458:3)
at Function.Module.runMain (module.js:605:10)
at startup (bootstrap_node.js:158:16)
What am I doing wrong ? I was supposed to add sample epub files after this command . right ?

issues in updated version

Hi @larsvoigt,
1- where does the indexed files saved?
In the previouse version I could see that the indexed files has a name "IndexControllerDB". But after updating the new version I cannot find such file. Could you please let me know how to do you save indexed files?
2- How does it look for a book? (with its tile or file name?)
Another issues is that with the new version it looks for the title of the book and not the name of the book folder? I have integrated epub-search in Readium and with the previous version it worked without such issues!

Thanks in advance,
Hajar

search a term in specific epub book

Hi @larsvoigt,

1- When I use "http://localhost:8085/search?q=hat&t=moby_dick" or "http://localhost:8085/search?q=epub&t=moby+dick", it does not show any result. But when I just use"moby" with no"dick", then it gives the result!
Could you please let me know your thought!

2- The other issue is when I search for a term such as "hat" it gives me even "that" that contains "hat". I tried to use "http://localhost:8085/search?q= hat&t=moby" (an space before term "hat"). It did not work with space. I tried to fix it but the issue is still not solved. Could you please let me know where I need to work in order to fix it.

Thanks in advance,
Hajar

Cannot search japanese content

Hi,
It work ok with English or non unicode characters but when I try to search Japanese content, it cant return result.

Can anyone help me solution for this?!

Chaining steps to show the example as whole

Scripting:

  • npm install epub-full-text-search -g ->
  • epub-search start ->
  • epub-search writeToIndex -p {prefix}/node_modules/epub-full-text-search/node_modules/epub3-samples ->
  • open browser ->
  • http://localhost:8085/search?q=epub&t=Accessible+EPUB+3

Build error

An error occured :
[email protected] build-bin C:\Users\11050\AppData\Roaming\npm\node_modules\epub-full-text-search

babel ./bin/epub-search --presets babel-preset-es2015 --out-file ./dist/bin/epub-search && ./node_modules/.bin/babel ./bin/search-engine-CLI --presets babel-preset-es2015 --out-file ./dist/bin/search-engine-CLI

'.' is not recognized as an internal or external command,

Teaser?

Add a short teaser to every cfi hit?

Type ahead search for all PDFS

Can u think of adding the following features?

  1. Search a content in all PDFs or EPUBs
  2. Typeahead search and the window should display an option to lookup the document instead of opening a new URL
  3. index processing percentage would be helpful

Incorrect cfi?

<p><span class="exercisenumber"><em>1</em></span> 
Prüfen Sie ehrlich und selbstkritisch Ihre Stärken und Schwächen in den Lern- und
Arbeitstechniken.
</p>

Looking for keyword: Lern. What is the correct cfi inline path?

,/3:87,/3:91
or
,/2:87,/2:91

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.