Git Product home page Git Product logo

lodestone's Introduction

lodestone_view

Lodestone - Personal Document Search & Archive

GitHub license Docker Pulls Gitter chat

NOTE: Lodestone is a Work-in-Progress and is not production ready.

Lodestone is designed to be the modern and digital equivalent of a home filing cabinet. If you've gone searching for something similar in the past, you might be familiar with terms like Electronic Document Management System (EDMS), Document Management System (DMS) or Personal Archival.

Lodestone is designed around a handful of core features:

  • Full text document search - It doesn't matter what format you're document is in, we should be able to parse it (using OCR) and let you search for the text.
  • Rich tagging - Unlike a physical file cabinet where a document can only exist in one place, digital documents support tags, allowing you to create a flexible organizational structure that works for you.
  • Automated - Document collection & OCR processing should be automatic. Just saving a file to your network drive should be enough to start document processing.
  • Non-destructive - When Lodestone processes a document, the original file will be left untouched, exactly where you left it.
  • Web Accessible - Lodestone is designed to run on a trusted home server and be accessible 24x7.
  • Filesystem/Cloud Sync - Optionally synchronize your tagged documents via a cloud storage provider of your choice (Dropbox, GDrive, etc) or access via a FUSE filesystem mount.

Screenshot

Dashboard

More screenshots available in the docs/screenshots directory.

Installation

Lodestone is made up of a handful of open-source components, and as such its easiest to deploy using Docker/Docker Compose

docker-compose up

# then open the following url in your browser

http://localhost/

Place your documents in the /data/storage/documents directory, and the Filesystem Collector should automatically start processing them.

If you would like some test documents to play with safely, you can take a look at the LodestoneHQ/lodestone-test-docs repository.

Configuration

Lodestone follows a Convention over Configuration design, which means that it works out of the box with sane defaults, but you can customize them to match your needs.

Most of the configuration files are stored in the webapp image (source code here), and requested by various components when they start up.

  • filetypes.json (backend/data/filetypes.json) contains lists of includes and excludes that are used by the processor container to decide which files to process and load into the database.

  • tags.json (backend/data/tags.json) contains a nested structure of labels that can be used to group tags and seach for your documents in the Lodestone web UI.

  • mapping.json (backend/data/mappings.json) is used to ensure that the elasticsearch container has a consistent data storage structue.

To overide these files, just setup a Docker volume binding to the specified file in the /lodestone/data/ directory in the webapp container.

Considerations

Lodestone is a very opinionated solution for personal document management. As such, there's a couple things you should know before even considering it.

  • Currently there's no user management. Lodestone is designed to run at home, on your trusted network. This may be reconsidered at a future date.

  • Limited support for file types

    • doc,docx,xls,xlsx, ppt, pptx - Microsoft Office Documents

    • pages, numbers, key - Apple iWork Documents

    • pdf

    • rtf

    • jpg, jpeg, png, tiff, tif

      If you think there are additional document types that may be useful to support, please open an issue.

What about..

As mentioned above, Lodestone isn't some magical new technology. EDMS and DMS systems have been around for a long time, but unfortunately they all seem to miss one or more features that I think are required for a modern filing cabinet.

Here's some of my research, but you should take a look at them yourselves.

Name Docker/Linux Web UI Modern UI Tagging Non-destructive OCR Watch Folder Email Import
MayanEDMS
Paperless

Place your documents in the /data/storage/documents directory, and the Filesystem Collector should automatically start processing them.

If you would like some test documents to play with safely, you can take a look at the LodesoneHQ/lodestone-test-docs repository.

If the processor doesn't pick up your files, you may have to fake an update to them to change the timestamp. This is temporary and will be resolved in a future release. You can use the command below to update the timestamp and trigger the processor:

find . -exec touch {} \;

Components

Name Software Version Docker Image
Elasticsearch Elasticsearch v7.2.1 lodestonehq/lodestone-elasticsearch
Document Processor Go lodestonehq/lodestone-document-processor
Thumbnail Processor Go lodestonehq/lodestone-thumbnail-processor
Web / Api Angular v11.x / ExpressJS v4.16 lodestonehq/lodestone-ui
Storage minio 2019 (S3 compatible) analogj/lodestone:storage
Queue RabbitMQ lodestonehq/lodestone-rabbitmq
OCR Tika lodestonehq/lodestone-tika

Future Development

Please see our Issues system for a list of items that have been reported. All issues for the project are contained in this repo. Issues are labeled by area affected, status, and other labels as appropriate. Below are some example of filtering issues by label:

Please feel free to create an issue if you have an idea for a new feature, find a bug, or have a question.

Logo

lodestone's People

Contributors

adam-stanek avatar analogj avatar dependabot[bot] avatar dskaggs avatar mhaluska avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lodestone's Issues

Support for XML-files

I have som XML-files that i want to organize in a DMS. Not parsing the XML-data, but just a place to save xml and index the name for fast searching of file name.

Thanks! Good Luck!

-Viktor

reverse proxy fscrawler missing port

Hi,

Thanks for sharing this project. I have been looking for something like this.

When I spin up on Ubuntu 18.04 I see constant

reverse-proxy_1  | time="2019-11-02T02:33:48Z" level=error msg="port is missing" providerName=docker container=fscrawler-lodestone-d8525a5f5cdc3439c626c7757935b2f7b1aadff8c2a05cd5f04a07353031f627

And can't search for anything in the /web UI. The status dashboard suggests it has seen the documents I added to the watchfolder.

I tried removing --api.insecure=true for Traefik in the docker-compose.yml but it didn't solve anything.

Please let me know if I can assist with further logs.

Thanks.

Basic installation how to

Hi Jason,
I like your project ans i try to install on my macbook pro.
Ok i have understand to use docker in folder project and run : docker-compose up
It create the container and run ok. But i haven't any file. I download your sample folder AnalogJ/lodestone-test-docs and put all in folder ./docs but i have no file when i run localhost:4000
And another thing, where i try to make a search, i write my search word but i can't click enter.
But when I click on enter nothing happens.
Have you more information for all that, because i like to try completely the project.
Have a nice day.
Best Regards
Robert

Additional repositories

lodestone-sync

  • ability to sync files to dropbox/gdrive/fuse filesystem.

lodestone-watchdog

  • run a watchdog command on a cron, verifying that all files in the FS are present in ES.

Storage publisher disconnects from Rabbitmq after computer put to sleep

storage_1        | 2020/01/11 15:53:32 error: Exception (504) Reason: "channel/connection is not open"
storage_1        | 2020/01/11 15:53:32 event: "/data/documents/filetypes/sample.txt": CHMOD
storage_1        | 2020/01/11 15:53:32 Ignoring event:  "/data/documents/filetypes/sample.txt": CHMOD
storage_1        | 2020/01/11 15:53:32 event: "/data/documents/filetypes/sample.rtf": CREATE
storage_1        | 2020/01/11 15:53:32 error: Exception (504) Reason: "channel/connection is not open"
storage_1        | 2020/01/11 15:53:32 event: "/data/documents/filetypes/sample.rtf": CHMOD
storage_1        | Publishing event..```

Add tabular view option to dashboard search results

On the main dashboard, it would be nice to have the option to view the search results in a tabular format instead of the default card view. This will help more easily deal with large numbers of results as our document libraries go.

We can add a control to the result header to switch between tabular and card view.

Add a status page

Status page should include

  • the status of the Elastic Search Watch folder collector.
  • status of thumbnail generator
  • status of email collector
  • status of elasticsearch api.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.