linkedpipes / etl Goto Github PK
View Code? Open in Web Editor NEWLinkedPipes ETL is an RDF based, lightweight ETL tool
Home Page: https://etl.linkedpipes.com
License: Other
LinkedPipes ETL is an RDF based, lightweight ETL tool
Home Page: https://etl.linkedpipes.com
License: Other
We need to develop and demonstrate pipeline nesting, e.g. for "graph copy" or "publication pipeline".
I should be able to parametrize the sub-pipeline as a DPU somehow
Add pipeline text edit mode. In this mode user can edit pipeline definition directly as a text.
There should be also possibility to validate user made changes.
This enables copy & paste of DPU configurations.
Add "sample" configuration file to each DPU and add a button that will put it in the dialog.
Ability to run a pipeline only to a selected DPU instead of the whole pipeline. Useful for debugging.
On pipeline edit (canvas), there should be a arrow for "add prerequisite" same as for connecting ports.
The special option storage.components.path.prefix = file:/
does not work on Linux now.
Reason:
Can't load component: http://obeu.vse.cz:9080/resources/pipelines/created-1455211750950/components/4581eaa0-744f-4c1b-8855-00daf014349b
Exception:
com.linkedpipes.executor.module.boundary.ModuleFacade$ModuleException: Can't load bundle from given location!
at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.getComponent(ModuleFacadeImpl.java:208)
at com.linkedpipes.executor.execution.contoller.PipelineExecutor.loadComponents(PipelineExecutor.java:409)
at com.linkedpipes.executor.execution.contoller.PipelineExecutor.innerExecute(PipelineExecutor.java:303)
at com.linkedpipes.executor.execution.contoller.PipelineExecutor.execute(PipelineExecutor.java:178)
at com.linkedpipes.executor.execution.boundary.impl.ExecutorImpl.lambda$execute$16(ExecutorImpl.java:47)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.osgi.framework.BundleException: Unable to cache bundle: file://opt/etl/deploy/components/e-httpGetFile/e-httpGetFile-0.0.0.jar
at org.apache.felix.framework.Felix.installBundle(Felix.java:2969)
at org.apache.felix.framework.BundleContextImpl.installBundle(BundleContextImpl.java:167)
at org.apache.felix.framework.BundleContextImpl.installBundle(BundleContextImpl.java:140)
at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.getComponent(ModuleFacadeImpl.java:206)
... 7 more
Caused by: java.net.UnknownHostException: opt
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.ftp.impl.FtpClient.doConnect(FtpClient.java:957)
at sun.net.ftp.impl.FtpClient.tryConnect(FtpClient.java:917)
at sun.net.ftp.impl.FtpClient.connect(FtpClient.java:1012)
at sun.net.ftp.impl.FtpClient.connect(FtpClient.java:998)
at sun.net.www.protocol.ftp.FtpURLConnection.connect(FtpURLConnection.java:294)
at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(FtpURLConnection.java:393)
at org.apache.felix.framework.util.SecureAction.getURLConnectionInputStream(SecureAction.java:525)
at org.apache.felix.framework.cache.JarRevision.initialize(JarRevision.java:166)
at org.apache.felix.framework.cache.JarRevision.<init>(JarRevision.java:77)
at org.apache.felix.framework.cache.BundleArchive.createRevisionFromLocation(BundleArchive.java:878)
at org.apache.felix.framework.cache.BundleArchive.reviseInternal(BundleArchive.java:550)
at org.apache.felix.framework.cache.BundleArchive.<init>(BundleArchive.java:153)
at org.apache.felix.framework.cache.BundleCache.create(BundleCache.java:277)
at org.apache.felix.framework.Felix.installBundle(Felix.java:2965)
... 10 more
When opening pipeline, show left top corner.
This is for the case that someone moves the DPUs away from the starting position.
When I decompress English DBpedia labels bzip2 dump from http://downloads.dbpedia.org/2015-04/core-i18n/en/labels_en.ttl.bz2 with the t-unpack, the result contains mangled characters, such as the following:
<http://dbpedia.org/resource/Contempor√¢neos> <http://www.w3.org/2000/01/rdf-schema#label> "Contempor√¢neo�0ë6ug©ÌréHÅ�ö…ñï8fw4‚7⁄ŒóΩ�¶}°O‘Öxå ?Ã9Œ~À”÷æ∏®^Óù·Å
ÜÙÖzÚ+4m�fı¿Rø≠0µ�äX�gÑ�,Ô��∂S
4ÀÈ:‰yıúV§°�í6;?äœpŸKäÌ{�n¬�·� ;)”f�¥HN¯∑«d-�e™Õn1äNËà¯F¨∂óQ”x�'Jëå…�V�√‡‡˝‚ÓEH ÍĬ¶�‰3V¥-ò�Û˜Z"qa˙∏�
y£É��öSk�˚2 2ê¡�Ê_™�¸V™DÌ�x᩵wÅ�‹ˇ€~Æßá�O�dùDâ6ö÷#�æ�G�˚√]b!ÜíªüÅ�}gIfiE´Êä»b'[’4ä⁄Öõƒl�3?�ïüI�V¬#8Óvø·|Ä¡π„öã �Æ X™»ß˙ÿ
˘ç���03—g��>"∏i��I˘”x´}8�ªgπn]4’0�?0;«`�ßG≈
This makes the file syntactically invalid.
However, when I decompress the same file with bzip2
(version 1.0.6) shell tool, the result is OK.
etl/frontend/modules/pipelines.js
Line 76 in 34ce822
Put 'use strict';
at the top of files to catch errors like this.
Now some labels are selected in Frotend.Backend and some in Frotnend.Client - we should unify this and shift it to Client site.
There can be changed in a configuration class over time. We need to be able to provide DPU author with a simple way how to update configuraitons
It would be nice to have a possibility how to easily define a new regression test for an existing Component.
Use-case: Tabular is widely used Component and it's also used in some important pipelines. When I made change to Tabular, I do not know whether I didn't break some pipelines.
There should be an easy way (button + form) that enables to add certain Tabular use-case to regression test storage. Only input, output files and configuration of DPU would be saved.
Then during Component update, those tests can be execute to make sure that the Component maintained backward compatibility.
Sometimes, pipelines can get stuck for various reasons. This should be detected and dealt with, ideally by stopping the pipeline.
Cancel in the bottom left corner and OK in the bottom right is weird. It should be the other way around I guess...
On import, rewrite DPU template IRI so that it becomes dereferencable again.
Date fields in Dataset metadata and Distribution metadata DPUs should have a date picker. Otherwise, users need to guess in what format to provide the date. This is relevant to the fields "Issued" and "Modified".
If we add unreliable repository (Virtuoso, remove Sesame server), we should add fault tolerance for the Sesame data unit (ActionExecutor in dataunit-sesame).
Check for cycles in pipeline, including run after edges, and display proper error message when trying to save.
Because you are using Javascript anyway, you could use Gulp to write your deploy script in a way that is cross-platform, and not maintain different scripts for different platforms. This can especially be useful as others adopt and need to configure their deployment, hook into the config, etc.
An example gulpfile (which is doing many things, not deployment though, so just FYI).
To improve usability control-click (either Ctrl or Cmd) should be supported on the navigation elements to open them in new tabs. Currently, control-click opens in the same tab. This would allow, for example, to navigate to a pipeline, while editing another one.
The idiomatic way to do configuration in Node is via nconf: https://www.npmjs.com/package/nconf
While it might not seem necessary now, it will really help as you move forward, especially for allowing easy customisation of the configuration object (via local files for local dev, env vars for production, etc.)
etl/frontend/modules/configuration.js
Line 11 in 34ce822
Related, instance of passing around config throughout the app by importing the module, it could be useful to set the config instance on the app object (at least when you need to access the config from request/response handlers). Example: https://github.com/okfn/spend-publishing-dashboard/blob/master/app/bootstrap.js#L17
We use time with milliseconds to sort execution messages, however in some cases two messages can be emitted in the same time and thus they can be in invalid order.
As a solution we should use artificial index.
For demo purposes it could be useful to have a switch in confoguration to disable pipeline delete, copy create save and execute.
Debug and data units now utilize absolute paths. This can be problem if the execution directory is moved as it broke all references.
The so called external process is used by execution monitor to support cooperation with third-party tools. This functionality is not finished, but the API already exists - we should finish it or remove the API and respective code.
There should be a way how to configure progress report from Frontend.
user should be able to set if show progress (performance) and select size of the report step.
The type of the data unit represented by a port is not apparent. It could be represented by an icon in an slightly enlarged data port circle.
E.g. https://design.google.com/icons/#ic_attach_file (files), https://design.google.com/icons/#ic_polymer (1 graph), https://design.google.com/icons/#ic_list (multigraph)
Make status codes consistent with HTTP status code classes
https://github.com/linkedpipes/etl/blob/master/commons-entities/src/main/java/com/linkedpipes/commons/entities/executor/ExecutionStatusCode.java
Pipeline progress could be shown graphically, in the pipeline editor.
Relates to #36
Instead of the pipeline name, the executions view displays http://obeu.vse.cz:9080/resources/executions/
(tested on ...).
$ ./executor.sh >> executor.log &
fish: The file './executor.sh' is not executable by this user
~/d/m/l/e/deploy (master) $ chmod 0777 *.sh
~/d/m/l/e/deploy (master) $ ./executor.sh >> executor.log &
Failed to execute process './executor.sh'. Reason:
exec: Exec format error
The file './executor.sh' is marked as an executable but could not be run by the operating system.
RW: http://stackoverflow.com/questions/10376206/what-is-the-preferred-bash-shebang
#!/usr/bin/env bash
worked for me
Error: ENOENT: no such file or directory, scandir '/data/lp/etl/components'
[main] ERROR c.l.c.c.c.b.AbstractConfiguration - Missing configuration property: 'external.fuseki.path'
You already create an Express instance for the application. You create another Express instance for the router.
The idiomatic way would be to create a Router instance for the Router(s).
etl/frontend/routes/resources.js
Line 284 in 34ce822
Can simply be like:
The status of backend(s) could be shown, along with disabling/enabling actions affected.
The checkbox for spatial coverage in the Dataset metadata DPU is labelled as "Use temporal coverage". This is probably an error caused by copy-pasting the label of temporal coverage. The label should be "Use spatial coverage".
XSLT transformer has a color of an extractor.
On a pipeline, I could disable DPUs. The pipeline would behave as if the DPUs were not there.
This is currently not supported, would be nice to have, but requires substantial frontend work.
The type
attribute of the input
elements for entering password should be set to password
. Currently, this is not the case for the Files to SCP DPU.
This could be done via the materialdesign search bar with a slightly more than keyword search intelligence (e.g. tags, etc.)
Currently there is no support, although the used RDF format is more than suitable for utilization of multiple languages.
The author of DPU should have buildin automatic support for multiple languages for events/exceptions.
The event/exception should be provided with a string, which is translated (based on localization files) to selected/all languages, without the need of DPU's author assistance.
Use something like: http://www.slf4j.org/localization.html ?
I see that in various places you use the sync versions of Node's IO methods. Example:
etl/frontend/modules/templates.js
Line 31 in 34ce822
This is generally considered bad practice, and, esp. if these functions need to be called during the app's life cycle (and not just as part of a one-off bootstrap when loading) could lead to significant performance issues due to how IO works in Node. Embrace Promises
ERROR: Error reloading cached bundle, removing it: .\.felix\bundle1 (java.io.FileNotFoundException: .\.felix\bundle1\bundle.location (The system cannot find the file specified))
java.io.FileNotFoundException: .\.felix\bundle1\bundle.location (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.<init>(Unknown Source)
at org.apache.felix.framework.util.SecureAction.getFileInputStream(SecureAction.java:453)
at org.apache.felix.framework.cache.BundleArchive.readLocation(BundleArchive.java:1107)
at org.apache.felix.framework.cache.BundleArchive.readBundleInfo(BundleArchive.java:973)
at org.apache.felix.framework.cache.BundleArchive.<init>(BundleArchive.java:182)
at org.apache.felix.framework.cache.BundleCache.getArchives(BundleCache.java:247)
at org.apache.felix.framework.Felix.init(Felix.java:754)
at org.apache.felix.framework.Felix.init(Felix.java:624)
at org.apache.felix.framework.Felix.start(Felix.java:960)
at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.start(ModuleFacadeImpl.java:117)
at com.linkedpipes.executor.module.boundary.impl.ModuleFacadeImpl.onApplicationEvent(ModuleFacadeImpl.java:167)
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:163)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:136)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:380)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:334)
at org.springframework.context.support.AbstractApplicationContext.start(AbstractApplicationContext.java:1271)
at com.linkedpipes.executor.Executor.main(Executor.java:41)
A selected DPU cannot be deselected by clicking on canvas.
Also, when no DPU is selected, I cannot open the add new DPU dialog.
This happened once before using Chrome and disappeared after restart of chrome... weird.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.