Git Product home page Git Product logo

chrfoyer / versatile-data-kit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vmware/versatile-data-kit

0.0 0.0 0.0 109.94 MB

One framework to develop, deploy and operate data workflows with Python and SQL.

License: Apache License 2.0

Shell 0.85% JavaScript 5.07% Python 35.69% Java 23.50% TypeScript 28.11% CSS 0.07% HTML 3.91% Jupyter Notebook 1.13% Dockerfile 0.08% SCSS 1.46% Mustache 0.14%

versatile-data-kit's Introduction

Versatile Data Kit Versatile Data Kit

Last Activity monthly download count for vdk-core license pre-commit build status twitter YouTube Channel Subscribers


One framework to🧑‍💻 Develop ▶️ Deploy and 📊 Operate
data workflows with Python and SQL


🎯 Write shorter, more readable code.
🔄 Ready-to-use data ETL/ELT patterns.
🧩 Lego-like extensibility.

🚀 Single click deployment.
🛠 Operate and monitor. ️

Intro to VDK SDK Ingestion Transformation Job Deployment Job Operations Extensibility Support and Contributing

Introduction to the VDK SDK

  • Framework to simplify data ingestion and data processing.
  • Write any code using Python or SQL.
  • A toolset enabling you to run data jobs.

Get started with VDK SDK:

Install Quickstart VDK. Only requirement is Python 3.7+.
pip install quickstart-vdk
vdk --help
➡ Develop your First Data Job if you are impatient to start quickly.
VDK.SDK.2.mp4



Intro to VDK SDK Ingestion Transformation Job Deployment Job Operations Extensibility Support and Contributing

Data Ingestion

  • Extract data from various sources (HTTP APIs, Databases, CSV, etc.).
  • Ensure data fidelity with minimal transformations.
  • Load data to your preferred destination (database, cloud storage).

Ingestion examples:

Ingesting data from REST API into Database
Ingesting data from DB into Database
Ingesting local CSV file into Database
Incremental ingestion using Job Properties
VDK.Ingestion.2.mp4



Intro to VDK SDK Ingestion Transformation Job Deployment Job Operations Extensibility Support and Contributing

Data Transformation

  • SQL and Python parameterized transformations.
  • Extensible templates for data modeling.
  • Creates a dataset or table as a product.

Get started with transforming data:

Data Modeling: Treating Data as a Product
Processing data using SQL and local database
Processing data using Kimball warehousing templates
Transform.VDK.2.mp4



Intro to VDK SDK Ingestion Transformation Job Deployment Job Operations Extensibility Support and Contributing

Data Job Deployment (build, deploy, release)

VDK Control Service provides REST API for users to create, deploy, manage, and execute data jobs in a Kubernetes runtime environment.
  • Scheduling, packaging, dependencies management, deployment.
  • Execution management and monitoring.
  • Source code versioning and tracking. Fast rollback.
  • Manage state and credentials using Properties and Secrets.

Get started with deploying jobs in control service:

Install Local Control Service with vdk server --install
Scheduling a Data Job for automatic execution
Using VDK DAGs to orchestrate Data Jobs
VDK.CS.2.mp4



Intro to VDK SDK Ingestion Transformation Job Deployment Job Operations Extensibility Support and Contributing

Operations and Monitoring

  • Use Operations UI to monitor, troubleshoot data workloads in production.
  • Notifications for errors during Data Job deployment or execution.
  • Route errors to correct people by classifying them into User or Platform errors.

Get started with operating and monitoring data jobs:

Versatile Data Kit UI - Installation and Getting Started
VDK Operations User Interface - Versatile Data Kit
VDK.UI.2.mp4



Intro to VDK SDK Ingestion Transformation Job Deployment Job Operations Extensibility Support and Contributing

Lego like extensibility

  • Modular: use only what you need. Extensible: build what you miss.
  • Easy to install any plugins as python packages using pip.
  • Used in enhancing data processing, ingestion, job execution, command-line lifecycle

Get started with using some VDK plugins:

Browse available plugins
➡ Interesting plugins to check out:
       Track Lineage of your jobs using vdk-lineage
       Import/Ingest or Export CSV files using vdk-csv
Write your own plugin
VDK.plugins.2.mp4



Intro to VDK SDK Ingestion Transformation Job Deployment Job Operations Extensibility Support and Contributing

Support and Contributing

For Support, you can join our Slack channel, create an issue or pull request on GitHub to submit suggestions or changes.
If you are interested in contributing as a developer, visit the contributing page.

Contacts

Code of Conduct

Everyone involved in working on the project's source code, or engaging in any issue trackers, Slack channels, and mailing lists is expected to be familiar with and follow the Code of Conduct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.