Git Product home page Git Product logo

etl's People

Contributors

cowclaw avatar jakubklimek avatar jindrichmynarz avatar jlleitschuh avatar mnmercer avatar nvdk avatar skodapetr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

etl's Issues

RDF based pipeline edit

Add pipeline text edit mode. In this mode user can edit pipeline definition directly as a text.
There should be also possibility to validate user made changes.
This enables copy & paste of DPU configurations.

"Debug to" functionality

Ability to run a pipeline only to a selected DPU instead of the whole pipeline. Useful for debugging.

Cannot run pipelines on Linux

The special option storage.components.path.prefix = file:/ does not work on Linux now.

Reason:
 Can't load component: http://obeu.vse.cz:9080/resources/pipelines/created-1455211750950/components/4581eaa0-744f-4c1b-8855-00daf014349b 
Exception:
 com.linkedpipes.executor.module.boundary.ModuleFacade$ModuleException: Can't load bundle from given location!
    at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.getComponent(ModuleFacadeImpl.java:208)
    at com.linkedpipes.executor.execution.contoller.PipelineExecutor.loadComponents(PipelineExecutor.java:409)
    at com.linkedpipes.executor.execution.contoller.PipelineExecutor.innerExecute(PipelineExecutor.java:303)
    at com.linkedpipes.executor.execution.contoller.PipelineExecutor.execute(PipelineExecutor.java:178)
    at com.linkedpipes.executor.execution.boundary.impl.ExecutorImpl.lambda$execute$16(ExecutorImpl.java:47)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.osgi.framework.BundleException: Unable to cache bundle: file://opt/etl/deploy/components/e-httpGetFile/e-httpGetFile-0.0.0.jar
    at org.apache.felix.framework.Felix.installBundle(Felix.java:2969)
    at org.apache.felix.framework.BundleContextImpl.installBundle(BundleContextImpl.java:167)
    at org.apache.felix.framework.BundleContextImpl.installBundle(BundleContextImpl.java:140)
    at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.getComponent(ModuleFacadeImpl.java:206)
    ... 7 more
Caused by: java.net.UnknownHostException: opt
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
    at java.net.Socket.connect(Socket.java:589)
    at java.net.Socket.connect(Socket.java:538)
    at sun.net.ftp.impl.FtpClient.doConnect(FtpClient.java:957)
    at sun.net.ftp.impl.FtpClient.tryConnect(FtpClient.java:917)
    at sun.net.ftp.impl.FtpClient.connect(FtpClient.java:1012)
    at sun.net.ftp.impl.FtpClient.connect(FtpClient.java:998)
    at sun.net.www.protocol.ftp.FtpURLConnection.connect(FtpURLConnection.java:294)
    at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(FtpURLConnection.java:393)
    at org.apache.felix.framework.util.SecureAction.getURLConnectionInputStream(SecureAction.java:525)
    at org.apache.felix.framework.cache.JarRevision.initialize(JarRevision.java:166)
    at org.apache.felix.framework.cache.JarRevision.<init>(JarRevision.java:77)
    at org.apache.felix.framework.cache.BundleArchive.createRevisionFromLocation(BundleArchive.java:878)
    at org.apache.felix.framework.cache.BundleArchive.reviseInternal(BundleArchive.java:550)
    at org.apache.felix.framework.cache.BundleArchive.<init>(BundleArchive.java:153)
    at org.apache.felix.framework.cache.BundleCache.create(BundleCache.java:277)
    at org.apache.felix.framework.Felix.installBundle(Felix.java:2965)
    ... 10 more

t-unpack bzip2 decompression mangles data

When I decompress English DBpedia labels bzip2 dump from http://downloads.dbpedia.org/2015-04/core-i18n/en/labels_en.ttl.bz2 with the t-unpack, the result contains mangled characters, such as the following:

<http://dbpedia.org/resource/Contempor√¢neos> <http://www.w3.org/2000/01/rdf-schema#label> "Contempor√¢neo�0ë6ug©ÌréHÅ�ö…ñï8fw4‚7⁄ŒóΩ�¶}°O‘Öxå ?Ã9Œ~À”÷æ∏®^Óù·Å
ÜÙÖzÚ+4m�fı¿Rø≠0µ�äX�gÑ�,Ô��∂S
4ÀÈ:‰yıúV§°�í6;?äœpŸKäÌ{�n¬�·� ;)”f�¥HN¯∑«d-�e™Õn1äNËà¯F¨∂óQ”x�'Jëå…�V�√‡‡˝‚ÓEH  ÍĬ¶�‰3V¥-ò�Û˜Z"qa˙∏�
y£É��öSk�˚2 2ê¡�Ê_™�¸V™DÌ�x᩵wÅ�‹ˇ€~Æßá�O�dùDâ6ö÷#�æ�G�˚√]b!ÜíªüÅ�}gIfiE´Êä»b'[’4ä⁄Öõƒl�3?�ïüI�V¬#8Óvø·|Ä¡π„öã �Æ X™»ß˙ÿ
˘ç���03—g��>"∏i��I˘”x´}8�ªgπn]4’0�?0;«`�ßG≈

This makes the file syntactically invalid.

However, when I decompress the same file with bzip2 (version 1.0.6) shell tool, the result is OK.

Frontend: Labels usage

Now some labels are selected in Frotend.Backend and some in Frotnend.Client - we should unify this and shift it to Client site.

Support for easy DPU regression testing

It would be nice to have a possibility how to easily define a new regression test for an existing Component.
Use-case: Tabular is widely used Component and it's also used in some important pipelines. When I made change to Tabular, I do not know whether I didn't break some pipelines.
There should be an easy way (button + form) that enables to add certain Tabular use-case to regression test storage. Only input, output files and configuration of DPU would be saved.
Then during Component update, those tests can be execute to make sure that the Component maintained backward compatibility.

Add a cleaner for stuck pipelines

Sometimes, pipelines can get stuck for various reasons. This should be detected and dealt with, ideally by stopping the pipeline.

Fault tolerace in sesame dataunit

If we add unreliable repository (Virtuoso, remove Sesame server), we should add fault tolerance for the Sesame data unit (ActionExecutor in dataunit-sesame).

Check for cycles in pipeline

Check for cycles in pipeline, including run after edges, and display proper error message when trying to save.

Cross platform deploy scripts

Because you are using Javascript anyway, you could use Gulp to write your deploy script in a way that is cross-platform, and not maintain different scripts for different platforms. This can especially be useful as others adopt and need to configure their deployment, hook into the config, etc.

An example gulpfile (which is doing many things, not deployment though, so just FYI).

Recommend next DPU when creating pipeline

  1. Click on an output data port of a DPU
  2. Drag the edge into empty space
  3. Click/touch.
  • Display a list of DPUs, when selected, insert it. When only one possible edge, connect it.
  • Filter offered DPUs only to those actually compatible
  • Sort offered DPUs according to their probability of being connected to this DPU

Support control-click to open in a new tab

To improve usability control-click (either Ctrl or Cmd) should be supported on the navigation elements to open them in new tabs. Currently, control-click opens in the same tab. This would allow, for example, to navigate to a pipeline, while editing another one.

Use nconf

The idiomatic way to do configuration in Node is via nconf: https://www.npmjs.com/package/nconf

While it might not seem necessary now, it will really help as you move forward, especially for allowing easy customisation of the configuration object (via local files for local dev, env vars for production, etc.)

Related, instance of passing around config throughout the app by importing the module, it could be useful to set the config instance on the app object (at least when you need to access the config from request/response handlers). Example: https://github.com/okfn/spend-publishing-dashboard/blob/master/app/bootstrap.js#L17

Execution messages in wrong order

We use time with milliseconds to sort execution messages, however in some cases two messages can be emitted in the same time and thus they can be in invalid order.

As a solution we should use artificial index.

Add download pipeline button

  1. Pipeline desinger
  2. Pipeline list: add menu as in Execution for small devices and add download button to the list and to the menu

Use relative paths in debug

Debug and data units now utilize absolute paths. This can be problem if the execution directory is moved as it broke all references.

Executor-Monitor: ExternalProcess

The so called external process is used by execution monitor to support cooperation with third-party tools. This functionality is not finished, but the API already exists - we should finish it or remove the API and respective code.

Progress report configuration

There should be a way how to configure progress report from Frontend.

user should be able to set if show progress (performance) and select size of the report step.

Issues with first run on Mac OS X

  • shell scripts do not have the correct mode:
$ ./executor.sh >> executor.log &
fish: The file './executor.sh' is not executable by this user
  • shell scripts are missing shebang
~/d/m/l/e/deploy (master) $ chmod 0777 *.sh
~/d/m/l/e/deploy (master) $ ./executor.sh >> executor.log &
Failed to execute process './executor.sh'. Reason:
exec: Exec format error
The file './executor.sh' is marked as an executable but could not be run by the operating system.

RW: http://stackoverflow.com/questions/10376206/what-is-the-preferred-bash-shebang

#!/usr/bin/env bash worked for me

  • I had to create the directories manually, e.g. I got this error:
Error: ENOENT: no such file or directory, scandir '/data/lp/etl/components'
  • Application is not running
  • Executor is running
  • Executor monitor failed with message
[main] ERROR c.l.c.c.c.b.AbstractConfiguration - Missing configuration property: 'external.fuseki.path'
  • Frontend started with message "Backend is not running", all I see is navbar and spinner.

image

"Debug from" functionality

  • Ability to rerun only selected DPU
  • Ability to select, from which output data units the pipeline should run, based on its selected execution.

List of DPUs should have an order

Currently, it seems that there is no order in the list of available DPUs. For example, see this:

screen shot 2016-01-26 at 11 37 50

The list should be sorted in some order, e.g., alphabetically, and DPUs could be partitioned by their type (extractors, transformers, loaders).

Add support for localization

Currently there is no support, although the used RDF format is more than suitable for utilization of multiple languages.

The author of DPU should have buildin automatic support for multiple languages for events/exceptions.

The event/exception should be provided with a string, which is translated (based on localization files) to selected/all languages, without the need of DPU's author assistance.

Use something like: http://www.slf4j.org/localization.html ?

Avoid sync options for IO

I see that in various places you use the sync versions of Node's IO methods. Example:

This is generally considered bad practice, and, esp. if these functions need to be called during the app's life cycle (and not just as part of a one-off bootstrap when loading) could lead to significant performance issues due to how IO works in Node. Embrace Promises

Cannot run executor on Windows

ERROR: Error reloading cached bundle, removing it: .\.felix\bundle1 (java.io.FileNotFoundException: .\.felix\bundle1\bundle.location (The system cannot find the file specified))
java.io.FileNotFoundException: .\.felix\bundle1\bundle.location (The system cannot find the file specified)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(Unknown Source)
        at java.io.FileInputStream.<init>(Unknown Source)
        at org.apache.felix.framework.util.SecureAction.getFileInputStream(SecureAction.java:453)
        at org.apache.felix.framework.cache.BundleArchive.readLocation(BundleArchive.java:1107)
        at org.apache.felix.framework.cache.BundleArchive.readBundleInfo(BundleArchive.java:973)
        at org.apache.felix.framework.cache.BundleArchive.<init>(BundleArchive.java:182)
        at org.apache.felix.framework.cache.BundleCache.getArchives(BundleCache.java:247)
        at org.apache.felix.framework.Felix.init(Felix.java:754)
        at org.apache.felix.framework.Felix.init(Felix.java:624)
        at org.apache.felix.framework.Felix.start(Felix.java:960)
        at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.start(ModuleFacadeImpl.java:117)
        at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.onApplicationEvent(ModuleFacadeImpl.java:167)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:163)
        at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:136)
        at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:380)
        at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:334)
        at org.springframework.context.support.AbstractApplicationContext.start(AbstractApplicationContext.java:1271)
        at com.linkedpipes.executor.Executor.main(Executor.java:41)

Cannot deselect DPU by clicking on canvas

A selected DPU cannot be deselected by clicking on canvas.

Also, when no DPU is selected, I cannot open the add new DPU dialog.

This happened once before using Chrome and disappeared after restart of chrome... weird.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.