Git Product home page Git Product logo

Background

Vortext is a system that allows you to upload PDF documents, and annotate them with various extractions. In essence it’s a very simple web-based document management system.

It relies on Spá for its client-side functionality.

Vortext is under heavy development; the idea is to ultimately pair the management of extractions from documents with customizable machine learning predictors. This way we hope to ease the burden of extracting data from literature, as is often the case in the biomedical sciences and law. At this point, however, Vortext does not do any predictions. See the Vortext Demo repository for a system that does. If you are interested in helping us with these ideas, drop us a line at vortext.systems.

Technical overview

Server side

The server side is written in Clojure and uses PostgreSQL as the database. If you are new to Clojure the code might look unfamiliar. But, Clojure is a wonderful language, and if you are interested in learning more we recommend the following resources:

We use Luminus as a basis for many parts, so we recommended reading their documentation as well.

Client side

See the Spá repository for an overview of used technology.

Development prerequisites

Mac OS X

To develop the server we require leiningen which can be installed with Homebrew. We require at least Java JVM/JDK 1.8 and Leiningen 2.4.

brew update # make sure you have recent versions
brew install leiningen # install via Homebrew
git clone <this repo>
cd <your folder>
lein deps # retrieve project dependencies
git submodule update --init --recursive

# Compile the PDF.js files
cd resources/public/scripts/spa/pdfjs
brew install node # install nodejs via Homebrew
npm install
node make generic

To prevent some bugs and ensure future compatibility we convert the PDF documents to PDF/A-2 (PDF archive) before storing them. To do this we use GhostScript. If you have not yet installed GhostScript run brew install ghostscript.

Database

We’re using PostgreSQL as the database. The database settings can be configured with the environment variables specified by environ in project.clj. The default database is spa with user/pass spa/develop. You’ll obviously need to change this in production.

CREATE DATABASE spa;
CREATE USER spa WITH PASSWORD 'develop';
GRANT ALL PRIVILEGES ON DATABASE spa TO spa;

To populate the database tables run lein migrate. If you’re running OS X and are looking for a easy way to run PostgreSQL, we recommend Postgres.app.

Run

To run the server use

lein trampoline run start # will run the server
DEV=true lein trampoline run start # will run in development mode

It will run on port 8080 by default.

To deploy

The easiest way to deploy Vortext is to create an uberjar and deploy that. Run lein uberjar to create a stand-alone version that you can call with java -jar vortext.jar start. This jar can then be run as a service with things like upstart, systemd or whatever your taste is.

It is also recommended to minify the assets in production. We use RequireJS r.js for this.

To install r.js run npm install -g requirejs. Run the following before building the uberjar.

cd resources
r.js -o build.js

By default the production jar will serve the assets from the build folder, in the development it will serve from public. To prevent the production jar from serving the build folder (because you haven’t minified the assets) run the server with DEV=1 java -jar vortext.jar start, this is NOT recommended.

Future work

See ideas or the other issues.

Contributing

Currently this is a research object. The API and organizational structure are subject to change. Comments and suggestions are much appreciated. For code contributions: fork, branch, and send a pull request.

License

Vortext is open source, and licensed under GPLv3. See LICENSE.md for more information.

Vortext's Projects

clj-similar icon clj-similar

Experimental library for similar set lookup using MinHash and k-d trees

deepontology icon deepontology

A DeepWalk implementation for ontologies using NetworkX and Gensim

esther icon esther

Dear Esther, you're about to become an idea for a diary app that embeds an LLM.

flair icon flair

A very simple framework for state-of-the-art NLP

hnswlib icon hnswlib

Header-only C++/python library for fast approximate nearest neighbors

llama.clj icon llama.clj

Run LLMs locally. A clojure wrapper for llama.cpp.

lm-lstm-crf icon lm-lstm-crf

Empower Sequence Labeling with Task-Aware Language Model

mrpt icon mrpt

Fast and lightweight header-only C++ library (with Python bindings) for approximate nearest neighbor search

spa icon spa

The javascript front-end for rendering text-extraction on PDF documents

vortext-annotate icon vortext-annotate

Vortext Annotate is a platform for managing extractions from PDF documents

vortext-demo icon vortext-demo

Vortext viewer with custom topologies, for prototypes & demos

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.