Git Product home page Git Product logo

etl's Introduction

LinkedPipes ETL

LinkedPipes ETL is an RDF based, lightweight ETL tool.

Requirements

Installation

So far, you need to compile LP-ETL on your own:

Linux

$ git clone https://github.com/linkedpipes/etl.git
$ cd etl
$ mvn install
$ cp configuration.properties.sample deploy/configuration.properties
$ vi deploy/configuration.properties

Windows

We recommend using Cygwin and proceeding as with Linux.

Configuration

Now edit the configuration file, mainly adding paths to working, storage, log and library directories. Especially:

Running LinkedPipes ETL

To run LP-ETL, you need to run the three components it consists of. For debugging purposes, it is useful to store the console logs.

Linux

$ cd deploy
$ ./executor.sh >> executor.log &
$ ./executor-monitor.sh >> executor-monitor.log &
$ ./storage.sh >> storage.log &
$ ./frontend.sh >> frontend.log &

Windows

We recommend using Cygwin and proceeding as with Linux. Otherwise, in the deploy folder, run

  • executor.bat
  • executor-monitor.bat
  • sotrage.bat
  • frontend.bat

Unless configured otherwise, LinkedPipes ETL should now run on http://localhost:8080.

Plugins - Components

There are components in the jars directory. Detailed description of how to create your own coming soon.

Known issues

  • On some Linux systems, Node.js may be run by nodejs instead of node. In that case, you need to rewrite this in the deploy/frontend.sh script.

Update notes

Update note 3: When upgrading from develop prior to 2017-02-14, you need to delete {deploy}/jars and {deploy}/osgi.

Update note 2: When upgrading from master prior to 2016-11-04, you need to move your pipelines folder from e.g. /data/lp/etl/pipelines to /data/lp/etl/storage/pipelines, update the configuration.properites file and possibly the update/restart scripts as there is a new component, storage.

Update note: When upgrading from master prior to 2016-04-07, you need to delete your old execution data (e.g. in /data/lp/etl/working/data)

Update script

Since we are still in the rapid development phase, we update our instance often. This is an update script that we use and you can reuse if you wish. The script sets path to Java 8, kills running components (yeah, it is dirty), the repo is cloned in /opt/lp/etl and we store the console logs in /data/lp/etl

#!/bin/bash
echo Killing Executor
kill `ps ax | grep /executor.jar | grep -v grep | awk '{print $1}'`
echo Killing Executor-monitor
kill `ps ax | grep /executor-monitor.jar | grep -v grep | awk '{print $1}'`
echo Killing Frontend
kill `ps ax | grep node | grep -v grep | awk '{print $1}'`
echo Killing Storage
kill `ps ax | grep /storage.jar | grep -v grep | awk '{print $1}'`
cd /opt/lp/etl
echo Git Pull
git pull
echo Mvn install
mvn clean install
cd deploy
echo Running executor
./executor.sh >> /data/lp/etl/executor.log &
echo Running executor-monitor
./executor-monitor.sh >> /data/lp/etl/executor-monitor.log &
echo Running storage
./storage.sh >> /data/lp/etl/storage.log &
echo Running frontend
./frontend.sh >> /data/lp/etl/frontend.log &
echo Disowning
disown

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.