Git Product home page Git Product logo

typedb-loader's Introduction

TypeDBLoader_icon


TypeDB Loader Test TypeDB Loader Build


If your TypeDB project

  • has a lot of data
  • and you want/need to focus on schema design, inference, and querying

Use TypeDB Loader to take care of your data migration for you. TypeDB Loader streams data from files and migrates them into TypeDB at scale!

Features:

  • Data Input:

    • data is streamed to reduce memory requirements
    • supports any tabular data file with your separator of choice (i.e.: csv, tsv, whatever-sv...)
    • supports gzipped files
    • ignores unnecessary columns
  • Attribute, Entity, Relation Loading:

    • load required/optional attributes of any TypeDB type (string, boolean, long, double, datetime)
    • load required/optional role players (attribute / entity / relation)
    • load list-like attribute columns as n attributes (recommended procedure until attribute lists are fully supported by TypeDB)
    • load list-like player columns as n players for a relation
    • load entity if not present - if present, either do not write or append attributes
  • Appending Attributes to existing things

  • Append-Attribute-Or-Insert-Entity for entities

  • Data Validation:

    • validate input data rows and log issues for easy diagnosis input data-related issues (i.e. missing attributes/players, invalid characters...)
  • Configuration Validation:

    • write your configuration with confidence: warnings will display useful information for fine tuning, errors will let you know what you forgot. All BEFORE the database is touched.
  • Performance:

    • parallelized asynchronous writes to TypeDB to make the most of your hardware configuration, optimized with engineers @vaticle
  • Stop/Restart (in re-implementation, currently NOT available):

    • tracking of your migration status to stop/restart, or restart after failure
  • Basic Column Preprocessing using RegEx's

Create a Loading Configuration (example) and use TypeDB Loader

How it works:

To illustrate how to use TypeDB Loader, we will use a slightly extended version of the "phone-calls" example dataset and schema from the TypeDB developer documentation:

Configuration

The configuration file tells TypeDB Loader what things you want to insert for each of your data files and how to do it.

Here are some example:

For detailed documentation, please refer to the WIKI.

The config in the phone-calls test is a good starting example of a configuration.

Migrate Data

Once your configuration files are complete, you can use TypeDB Loader in one of two ways:

  1. As an executable command line interface - no coding required:
./bin/typedbloader load \
                -tdb localhost:1729 \
                -c /path/to/your/config.json \
                -db databaseName \
                -cm

See details here

  1. As a dependency in your own Java code:
import com.vaticle.typedb.osi.loader.cli.LoadOptions;
import com.vaticle.typedb.osi.loader.loader.TypeDBLoader;

public class LoadingData {

    public void loadData() {
        String uri = "localhost:1729";
        String config = "path/to/your/config.json";
        String database = "databaseName";

        String[] args = {
                "load",
                "-tdb", uri,
                "-c", config,
                "-db", database,
                "-cm"
        };

        LoadOptions options = LoadOptions.parse(args);
        TypeDBLoader loader = new TypeDBLoader(options);
        loader.load();
    }
}

See details here

Step-by-Step Tutorial

A complete tutorial for TypeDB version >= 2.5.0 is in work and will be published.

An example of configuration and usage of TypeDB Loader on real data can be found in the TypeDB Examples.

A complete tutorial for TypeDB (Grakn) version < 2.0 can be found on Medium.

There is an example repository for your convenience.

Connecting to TypeDB Cluster

To connect to TypeDB Cluster, a set of options is provided:

--typedb-cluster=<address:port>
--username=<username>
--password // can be asked for interactively
--tls-enabled
--tls-root-ca=<path/to/CA/cert>

Compatibility Table

Ranges are [inclusive, inclusive].

TypeDB Loader TypeDB Driver (internal) TypeDB TypeDB Cluster
1.8.0 2.25.6 2.25.x - 2.25.x -
1.7.0 2.18.1 2.18.x 2.23.x 2.18.x 2.23.x
1.6.0 2.14.2 2.14.x - 2.17.x 2.14.x - 2.16.x
1.2.0 - 1.5.x 2.8.0 - 2.14.0 2.8.0 - 2.14.0 N/A
1.1.0 - 1.1.x 2.8.0 2.8.x N/A
1.0.0 2.5.0 - 2.7.1 2.5.x - 2.7.x N/A
0.1.1 2.0.0 - 2.4.x 2.0.x - 2.4.x N/A
<0.1 1.8.0 1.8.x N/A

Find the Readme for GraMi for grakn < 2.0 here

Contributions

TypeDB Loader was built @Bayer AG in the Semantic and Knowledge Graph Technology Group with the support of the engineers @Vaticle.

Licensing

This repository includes software developed at Bayer AG. It is released under the Apache License, Version 2.0.

Credits

Icon in banner by Freepik from Flaticon

typedb-loader's People

Contributors

hkuich avatar flyingsilverfin avatar dmitrii-ubskii avatar jamesreprise avatar devkrish23 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.