Git Product home page Git Product logo

metanome-algorithms's Introduction

Metanome Algorithm Repository

This repository contains several data profiling algorithms for the Metanome platform. The algorithms have been implemented by students of the information systems group at the Hasso-Plattner-Institut (HPI) in the context of the Metanome project. All algorithms in the repository can be executed with the Metanome platform. Further, data profiling algorithms that have also been developed in the Metanome project but that are not (yet) compatible with the platform (because they are, for instance, distributed data profiling algorithms) are contained in the following repositories:

Installation

Before building the algorithms, the following prerequisites need to be installed:

  • Java JDK 1.8 or later
  • Maven 3.1.0
  • Git

Because all profiling algorithms rely on the Metanome platform, i.e., they use Metanome as a dependency, this project needs to be installed in the local maven repository first. So please visit the GitHub-page, checkout the sources and build them with the following command:

.../metanome$ mvn install

Then, all algorithms can be built with this command:

.../metanome-algorithms$ MAVEN_OPTS="-Xmx1g -Xms20m -Xss10m" mvn clean install

Alternatively, you can open the algorithms project in your IDE of choice, specify -Xmx1g -Xms20m -Xss10m as build parameters, and run it as mvn clean install.

The build creates one "fatjar" for each algorithm in the repository. After the build succeeded, run either the collect.bat (Windows) or collect.sh (Linux) script to copy all created algorithms into one folder named "COLLECTION". Now, you can choose the algorithms you need and copy them over into a Metanome deployment.

Headless deployment

To run the Metanome algorithms without a full Metanome deployment, consider the Metanome-cli project. This project extends the Metanome framework with a command line interface, so you can configure end execute the jars from a shell. If you need to integrate Metanome algorithms into your own projects, the Metanome-cli implementation can serve as a reference on how to add the algorithms into other projects.

Adding new algorithms

All algorithms in this repository are continuously maintained and upgraded to newer versions with every release of the Metanome framework. To add a new algorithm to the repository, the following steps should be followed:

  1. Copy the algorithm maven project into a subdirectory of the algorithms repository.

  2. Use the following pattern for the naming of your algorithm artifact:

      <groupId>de.metanome.algorithms.[algorithm-name-lowercase]</groupId>
      <artifactId>[algorithm-name]</artifactId>
      <packaging>jar</packaging>
      <name>[algorithm-name]</name>
    
  3. Set the parent pom to the root pom using the root's current version:

      <parent>
        <groupId>de.metanome.algorithms</groupId>
        <artifactId>algorithms</artifactId>
        <version>1.1-SNAPSHOT</version>
        <relativePath>../pom.xml</relativePath>
      </parent>
    
  4. Add the algorithm project as a module to the root pom of the reposotory.

  5. Remove the version tags of your project and all dependencies to Metanome subprojects; these versions are inherited from the root pom.

  6. Remove unnecessary repository information, e.g., all repositories that are defined in root/parent should not be duplicated.

  7. Add a copy command for the jar file of the new algorithm to the collect.bat and collect.sh scripts.

metanome-algorithms's People

Contributors

thorsten-papenbrock avatar dacry avatar fatschi avatar jonashering avatar parnswir avatar pmlanger avatar f4lco avatar hazourahh avatar timdraeger avatar tbsblfs avatar tabergma avatar codelionx avatar philipp94831 avatar michaelmior avatar maxifischer avatar jfrohnhofen avatar jens-ehrlich avatar eduardohmpena avatar xchrdw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.