Git Product home page Git Product logo

spuristo's Introduction

SPURISTO (Tracker in Esperanto)

Build Status

Inspired by

This project is inspired by Komadu project

Tracking what?

Who, When and how data was created, with as many context information as needed. And this is currently the kind of (simple for now) visualization Spuristo can provide:

Data Lineage

How to run the project

Start two Docker containers: RabbitMQ and ArangoDB

docker run -d --hostname localhost --name rabbit -p 8880:15672 -p 5672:5672 -p 5671:5671 -p 4369:4369 -p 25672:25672 -e RABBITMQ_DEFAULT_USER=admin -e RABBITMQ_DEFAULT_PASS=admin rabbitmq:3-management
docker run -e ARANGO_ROOT_PASSWORD=root -p 8529:8529 -d -v /Users/dmartin/arangodata:/var/lib/arangodb3 arangodb

Then start the (Spring Boot) application:

mvn spring-boot:run

You are ready to go.

You can publish data in RabbitMQ queues and look at the relations between entities with this URL:

http://localhost:6789/path?key=YourEntityKeyHere

What data can be?

3 types of data for now:

  • Entity
  • Activity
  • Agent

They all extend in a way or another a common type, called Thing (for now, not found something more clear to describe this ... thing :) ).

And one 'edge':

  • Relation

JSON format of these "Things"

An Entity:

  {
    "entity": {
      "key": "entity1",
      "clazz": "entity",
      "attributes": [
        {
          "name": "role",
          "value": {
            "abc": "ABC",
            "def": "DEF",
            "ghi": "GHI",
            "klm": "KLM"
          }
        },
        {
          "name": "label",
          "value": {
            "_default": "attribute1"
          }
        },
        {
          "name": "location",
          "value": {
            "_default": "attribute2"
          }
        },
        {
          "name": "value",
          "value": {
            "_default": "attribute3"
          }
        }
      ]
    }
  }

An Agent:

  {
    "agent": {
      "key": "agent1",
      "clazz": "agent",
      "attributes": [
        {
          "name": "role",
          "value": {
            "abc": "ABC",
            "def": "DEF",
            "ghi": "GHI",
            "klm": "KLM"
          }
        },
        {
          "name": "label",
          "value": {
            "_default": "attribute1"
          }
        },
        {
          "name": "other",
          "value": {
            "_default": "attribute2"
          }
        }
      ]
    }
  }

An Activity:

  {
    "activity": {
      "key": "activity1",
      "clazz": "activity",
      "attributes": [
        {
          "name": "role",
          "value": {
            "abc": "ABC",
            "def": "DEF",
            "ghi": "GHI",
            "klm": "KLM"
          }
        },
        {
          "name": "label",
          "value": {
            "_default": "attribute1"
          }
        },
        {
          "name": "other",
          "value": {
            "_default": "attribute2"
          }
        }
      ]
    }
  }

They are just the same, except a property (clazz, which is set automatically when the Java object is created since it's based on the class used).

A Relation:

  {
    "relation": {
      "key": null,
      "clazz": "relation",
      "attributes": null,
      "from": {
        "entity": {
          "key": "entity1",
          "clazz": "entity",
          "attributes": [
            {
              "name": "role",
              "value": {
                "abc": "ABC",
                "def": "DEF",
                "ghi": "GHI",
                "klm": "KLM"
              }
            },
            {
              "name": "label",
              "value": {
                "_default": "attribute1"
              }
            },
            {
              "name": "location",
              "value": {
                "_default": "attribute2"
              }
            },
            {
              "name": "value",
              "value": {
                "_default": "attribute3"
              }
            }
          ]
        }
      },
      "to": {
        "agent": {
          "key": "agent1",
          "clazz": "agent",
          "attributes": [
            {
              "name": "role",
              "value": {
                "abc": "ABC",
                "def": "DEF",
                "ghi": "GHI",
                "klm": "KLM"
              }
            },
            {
              "name": "label",
              "value": {
                "_default": "attribute1"
              }
            },
            {
              "name": "other",
              "value": {
                "_default": "attribute2"
              }
            }
          ]
        }
      },
      "type": "wasGeneratedBy"
    }
  }

A Relation contains full Thing objects in its "from" and "to" properties. You can create all 3 objects (Relation, From Thing and To Thing) when sending only the Relation. I'm sure it's not efficient the way it is currently coded, but as it's not my top priority right now, it doesn't matter, especially if transparently told ;-) A relation can also have its own set of attributes, why not after all?

For all the objects, the "key" MUST be unique in the category of each object (Two entities CAN'T have the same 'thing1' key, but an Activity and an Agent CAN have a key equals to "thing1").

What's useful for a data lineage service [WIP right under]

Some use cases:

Describe a model training

  • identify the data sets used

    • main data set and derived data sets (training, test...) At least a macro description : id of the data set (file path, other ?) and a description At most, the exact set described as a collection of entities
  • identify the set of parameters used An entity describing the set, with the use of parameters could be enough

  • identify the algorithm used should be an attribute of the model ?

  • identify the actions (data cleansing, fixing...), the intermediates entities and when the actions occur Can this be described as a plan?

  • identify the agent running the action

  • identify the output (model version)

Describe a prediction, on behalf of a user, based on a specific version of a model and a set of data

  • identify the action and the time of the action
  • identity who is the user and the role he is playing in the action
  • identify the model version used and the generic model it is linked to
  • identify the data provided in order to have a prediction
  • identify the result

Expected methods:

  • being able to create an entity with the minimum set of optional attributes

  • being able to add relations between entities (derived, revision)

  • being able to create an activity with the minimum set of optional attributes

  • being able to add relations between activities (derived, revision)

  • being able to create an agent with the minimum set of optional attributes

  • being able to add relations between entities (derived, revision)

spuristo's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.