Git Product home page Git Product logo

carnival's Introduction

License: GPL v3 Carnival CI Coverage Status Maven Central

Carnival

Carnival Carnival is an open source JVM data unification framework that allows for a large variety of extract, transform, and load (ETL), integration, and analysis tasks related to relational data and property graphs. Some key functionality includes a graph model specification, the aggregation of data from disparate sources into a unified property graph, and tools to reason over and interact with graph data using bounded operations.

External Resources

Overview

Carnival has three principal components: a graph modeling architecture, a caching facility for aggregating data from disparate data sources, and a framework for implementing graph algorithms. The graph modeling architecture is a layer over Java enumerations and Tinkerpop that allow a graph to be modeled and consumed by Tinkerpop traversal steps. The caching facility supports the aggregation and caching of data from relational database and JSON API sources. The graph algorithm framework provides a structured way to define and execute algorithms that operate over the property graph.

Packages

Core Packages

Name Description
carnival-core Basic Carnival framework. Implements the basic Carnival framework classes (vines, carnival modeling framework, carnival graph algorithm framework, etc). Defines the core carnival graph model. This model defines key carnival concepts such as processes, databases, and namespaces.
carnival-vine Framework for data adaptors called vines which faciliate loading and aggregation of source data. Implements data caching facilities.
carnival-graph Framework for defining carnival graph schemas (vertex and edge definitions). Contains the basic vertex, edge, and property classes.
carnival-util Standalone package that contains utility and helper classes such as data tables, reports, and SQL utilties, which are primarily used for dealing with relational data.
carnival-gradle Gradle plugin for building Carnival applications and libraries.

Contribution Guide

Carnival is an open source project and welcomes contributions! Please see the Contribution Guide for ways to contribute.

carnival's People

Contributors

augustearth avatar cstoeckert avatar greenguy33 avatar hjwilli avatar th5 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

carnival's Issues

Remove ConstrainedPropertyDefTrait

The mechanics of ConstrainedPropertyDefTrait are causing problems. It turns out that adding ConstrainedPropertyDefTrait to a PropertyDefTrait using the 'as' keyword is super-neat, but also causes problems. See implementing traits at runtime. Keeping the original PropertyDefTrait object, which will be an Enum normally, is desirable.

Having PropertyDefTrait and ConstrainedPropertyDefTrait seemed cleaner, but since it is causing issues and is not really required, will move the functionality of ConstrainedPropertyDefTrait to PropertyDefTrait and call it a day.

Finalize and publish v1

Clean up and finalize what is currently in the main branch as v1. Existing applications should be compatible with v1.

We will be doing a major refactor/reorg for v2.

Experiment: CachingJsonVine<T>

When interacting with an API that return JSON, it's not desirable to require shoe-horning data into a DataTable. It might possible to have a trait that leverages Java Generics and Jackson to cache JSON serialized objects.

Remove localBuildAndPublish

localBuildAndPublish was implemented because the default publishToMavelLocal was not re-publishing if the version was already present in the local repo, which was an issue during a dev cycle. @hjwilli suggested that -SNAPSHOT versions will overwrite.

  • test that -SNAPSHOT works as hoped/expected
  • remove localBuildAndPublish

Standardize Reaper similar to JasonVine

The JsonVine model has proven to be a good one. Standardize Reaper methods so they follow the JsonVine pattern.

  • similar builder methods
  • de-dup reaper runs using reaper args similar to vine method

Probably will store argument hash in the vertex that is created when a reaper runs.

Remove vineold and graphold

The core functionality has been replaced by vine and graph. Not all of the functionality has been migrated.... Tag the repository before removing the classes, so they can be easily round in the future.

Remove secondaryIdFieldMap

Related to #20.

We can remove secondaryIdFieldMap prior to removing KeyType. Probably best to factor out into a distinct issue.

APOC

APOC is not working. Since the upgrade of Groovy, Tinkerpop, and Neo4j libraries, I cannot find an APOC library that will work. Tested the following versions:

  • apoc-3.4.0.8-all
  • apoc-3.5.0.15-all
  • apoc-4.0.0.18-all
  • apoc-4.1.0.2-all

APOC has been carved out of the Carnival Dockerfile.

Find an APOC library that works or verify that one does not exist.

Graph model harmonization

Harmonize VertexDefTrait and EdgeDefTrait.

// having to call instance() is potentially non-intuitive

Pmbb.VX.PATIENT.instance().withProperties(Core.PX.NAME, 'Hayden').vertex(graph, g)

// doesn't matter
Pmbb.VX.PATIENT.withProperties(Core.PX.NAME, 'Hayden').create(graph, g)

// existing 0 or 1
Pmbb.VX.PATIENT.withProperties(Core.PX.NAME, 'Hayden').getOrCreate(graph, g)

// existing 1
Pmbb.VX.PATIENT.withProperties(Core.PX.NAME, 'Hayden').get(graph, g)

// maybe add find() that will return all pre-existing
// exists() returns true/false
// just have get() return the GraphTraversal tryNext() thing???
// get returns vertex or null?
// get returns an optional like thing?


Pmbb.EX.PARTICIPATED_IN_ENCOUNTER.relate(pV, encV, g)
Pmbb.EX.PARTICIPATED_IN_ENCOUNTER.instance().withProperties(Core.PX.THING, 'ada').from(pV).to(encV).create(graph, g)

Pmbb.EX.PARTICIPATED_IN_ENCOUNTER.withProperties(Core.PX.THING, 'ada').from(pV).to(encV).create(graph, g)
Pmbb.EX.PARTICIPATED_IN_ENCOUNTER.withProperties(Core.PX.THING, 'ada').from(pV).to(encV).getOrCreate(graph, g)
Pmbb.EX.PARTICIPATED_IN_ENCOUNTER.withProperties(Core.PX.THING, 'ada').from(pV).to(encV).get(graph, g)

Pmbb.EX.PARTICIPATED_IN_ENCOUNTER.withProperties(Core.PX.THING, 'ada').create(pV, encV, graph, g)


Core.PX.NAME.of(pV).isPresent()
Core.PX.NAME.valueOf(pV)

MethodsHolder look up class heirarchy for method classes

See carnival-core/src/test/groovy/carnival/core/util/MethodsHolderSpec.groovy. Without this change, we cannot extend GraphMethods classes add new methods or functionality, which could be useful.

For example, there could be an abstract GraphMethods class that contains a set of standardized methods to interact with a resource and also has an abstract method getServerUrl() that points to that resource. Concrete sub-classes could get the server URL from a configuration utility without having to re-implement the GraphMethod classes.

It would be nice to be able to inherit from GraphMethods classes.

Migrate from GPars to Java concurrency

GPars is no longer being maintained and has some very old dependencies. Java concurrency has come a long way. Our uses of GPars could be migrated to core Java (fairly) easily.

TinkerpopExtensions

Add extensions to Tinkerpop classes GraphTraversal and Graph to work with VertexDef, EdgeDef, and PropertyDef.

Configuration enhancements

For applications that have their own configuration machinery, it would be nice not to require a separate configuration file for Carnival configuration. Figure out how to provide the config data to Carnival directly so no separate file is required.

Scope carnival configuration within a top level element, which would enable integration of a carnival configuration in an existing application.yaml file that includes other application configuration.

Publish packages to an OSS repository

Publishing to Github Packages is useful, but users need to set up proper github authentication even if they are just attempting to pull from a public repo. Also, due to the inability of deleting or updating public packages on github, we cannot easily maintain SNAPSHOT versions.

Investigate alternative OSS package repos that would more easily support these features.

Central Repository:
https://central.sonatype.org/
https://central.sonatype.org/pages/ossrh-guide.html
https://central.sonatype.org/pages/requirements.html

GraphMethod make result optional

The method:
Map execute(...)

is a pain. The result generally does not mean much and is just boilerplate. Add a setResult() method that can be used inside execute() to set a result. Change signature to void execute(...).

Standardize the carnival.home, CARNIVAL_HOME thing

I think it's mostly there. The Groovy code should look in only one place for the home setting. Why not just pick carnival.home. If it's an app, the plugin will set this value automatically. If it's framework development, build.gradle can set carnival.home from the environment.

Vine v2

Migrate v1 Vine/Method to the v2 JsonVine style. The two will be merged. The class JsonVine will be renamed to Vine as the code at that level is generic. At the VineMethod level, there will be JsonVineMethod and TabularVineMethod. This will be a multi-step migration. Should probably be a milestone rather than a single issue, but here we are.

DynamicVertexDef has to go or be re-thought

See VertexDef, where one is created dynamically with no propertyDefs or anything. There would have to be a repository of created DynamicVertexDefs so they can be looked up to keep track of any properties they have. Maybe just get rid of the concept altogether.

Windows Registry error message during compile

Every time I compile any module in Windows, I see an error like this printed to STDOUT

Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.

The performance of the application does not seem to be in any way effected by this message.

`

Remove references to PMBB

Remove references to PMBB wherever possible/practical.

  • org.pmbb to org.carnival
  • references in tests

Reaper V2

Migrate Reaper, so it follows the Vine pattern.

Basic Call

myReaper.method('doSomething').arguments(Map).call()

Vertices representing all executed reaper method processes

myReaper.method('doSomething').processes()

Vertices representing all reaper method processes given arguments

myReaper.method('doSomething').arguments(Map).processes()

Vertices representing all executed reaper method processes

myReaper.method('doSomething').processes().vertices()

Outputs to any reaper process regardless of arguments

myReaper.method('doSomething').outputs()

Outputs to reaper processes with given arguments

myReaper.method('doSomething').arguments(Map).outputs()

Reaper process vertices unique by reaper method and arguments

When a reaper method is called, current behavior will create a reaper process vertex in the graph that has a label as per the reaper method, but does not contain any unique reference to the arguments that were used to call the reaper method. What this means is that the ensure() or optionallyRunSingletonProcess() methods are limited. Two calls to a reaper method with different arguments are viewed as the same call. So, the reaper method truly is a "singleton process" which is expected to be run only once.

optionallyRunSingletonProcess() methods can stay this way as they have "singleton" in the name. ensure() methods should be more flexible, taking the reaper method arguments into account so a reaper method can be called multiple times on different inputs.

There is probably deep design discussion to be had here.... Taking into account the reaper method and the arguments is nice, but it does not take into account the state of the graph. There is no guarantee that the same reaper method called with the same arguments at different times will have the same effect, which I suppose is the assumption here. Is that useful? Should we enable programmers to classify reaper methods and have different behavior of ensure() depending on the class? I do not know.

For now, making this change will at least bring us more in-line with vine behavior. There is no guarantee that a vine method, which queries a database, called with the same parameters at different times will have the same result. The database can change in that interval. Carnival vine caching makes the assumption that the programer is choosing to ignore any changes refreshing when they choose to do so by invalidating (deleting) cache files. Implementing a similar scheme at the reaper method level seems to make sense.

Travis doc builder

Update travis to build dynamic docs and move them to the gh-pages branch.

Build updates

  • update dependency versions
  • factor build properties into gradle.properties file
  • update carnival version number?

Carnival cleanup

The following cleanup can happen regardless of a potential refactor.

Vine

  • Vine.writeToCsvFile does not seem to be used anywhere.
  • Vine.readFromCsvFile(String) does not seem to be used anywhere.
  • Vine.readFromCsvFile(File) does not seem to be used anywhere.

IterativeCsvWriter

IterativeCsvWriter does not seem to be used anywhere.

Upgrade dependencies

With GPars removed, it should be possible to upgrade a number of Carnival dependencies. We have fallen behind on many fronts, including Groovy and Tinkerpop.

Remove KeyType

KeyType was important for Carnival v0 when there were complex data operations that consumed and produced data tables. With the introduction of the graph in Carnival v1, these operations are done in the graph. KeyType is no longer central and should be removed for simplicity's sake.

There is overlap here with secondaryIdFieldMap of GenericDataTable and MappedDataTable. Like KeyType, the secondaryIdFieldMap was important for data table operations. Because secondaryIdFieldMap depends on KeyType, it is not possible to remove KeyType without removing secondaryIdFieldMap.

Specify date format of MappedDataTable

MappedDataTable in the base dataAdd(Map) method formats dates as per the default SqlUtils date format, which is not the same as the default SQLServer date format. Add an option to specify the data format to use so that the date strings in MappedDataTable match the default format of SQLServer enabling easy data upload via standard SQLServer clients.

CacheFiles object

There are currently two styles of vines: Json and MappedDataTable. Json vines have a single unified cache file which contains both the data and meta-data. MappedDataTable has two files, a Yaml meta file and a CSV data file. The cache related vine methods for these two styles of vines differ. Json cache is a bare File. MappedDataTable cache is a DataTableFiles object. Unify these two into a single CacheFiles (or similar) object that has standardized functionality.

Vine refactor

The Vine, CachingVine, and VineMethod classes/interfaces/traits are muddled.... Clean things up.

  • Move the basic call() mechanism from CachingVine to Vine. Currently all Vine has are static methods. The mechanism to call VineMethods (without caching) should be in Vine, so there is some meaning/benefit to extending the Vine class.
  • Override the Vine call() method to hook into the caching machinery. Keeping CachingVine as a trait probably makes sense, so it can be added to the man branches of Vine subclasses as needed.
  • Evaluate potential changes to VineMethod... Maybe it's ok to stay how it is for now.

GenericDataTableVine*

Implement GenericDataTableVine* classes. Attempt to leverage inheritance/composition to share common code with MappedDataTableVine*.

Reasoner v2

Reasoners are very much like Reapers, except they can have inputs from the graph, whereas Reapers are expected not to read from the graph, only write to it.

ReasonerProcess

inputs() - returns the input vertices of the process

Carnival Gradle plugin split

Currently the Carnival Gradle plugin is about building an Carnival Micronaut application. It would be helpful to have a plugin that just supplies the Carnival dependencies for library building.

  • Move plugin carnival-gradle to carnival-micronaut-app
  • Create a stripped down carnival-library plugin with just the dependencies

DataTable case sensitive mode

Currently, the methods that case bare strings as field name and identifier field values coerce the case of the bare string. Field names become upper case, identifier field values become lower case. This is probably the correct default behavior as it is safer in terms of detecting data collisions.

However, FeatureReport, which extends MappedDataTable, is restricted by this behavior. FeatureReport is useful as a class to create data exports that are intended to be imported into other systems. Case coercion is not desirable in this case.

Add a "case-sensitive" flag to DataTable with a default of false. When true, field names and identifier field values are not case-coerced. This would make the data set more dangerous of data integrity, but more flexible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.