Git Product home page Git Product logo

hawkeyed520 / datavines Goto Github PK

View Code? Open in Web Editor NEW

This project forked from datavane/datavines

0.0 0.0 0.0 21.54 MB

Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.

Home Page: https://datavane.github.io/datavines-website/

License: Apache License 2.0

Shell 0.17% JavaScript 0.69% Java 78.18% TypeScript 20.39% HTML 0.07% Less 0.49%

datavines's Introduction

Datavines

EN doc CN doc

Data quality is used to ensure the accuracy of data in the process of integration and processing. It is also the core component of DataOps. DataVines is an easy-to-use data quality service platform that supports multiple metric.

Architecture Design

DataVinesArchitecture

Install

Need: Maven 3.6.1 and later

$ mvn clean package -Prelease -DskipTests

Features

Data Catalog

  • Obtain data source metadata regularly to construct data directory
  • Regular monitoring of metadata changes
  • Tag management with support for metadata

Data Catalog

Data Quality

  • Built-in 27 data quality check rules
  • Support 4 data quality check rule types
    • Single Table-Column Check
    • Single Table Custom SQL check
    • Cross Table Accuracy Check
    • Two Table Value Comparison Check
  • Support schedule tasks for check
  • Support SLA for check result alert

Data Quality

Data Profile

  • Support timing execution of data detection, output data profile report
  • Support automatically identify column types to automatically match appropriate data profile indicators
  • Support table row number trend monitoring
  • Support data distribution view

数据目录

Plug-in Design

The platform is based on plug-in design, and the following modules support user-defined plug-ins to expand

  • Data Source: MySQL, Impala, Starocks, Doris, Presto, Trino, ClickHouse, PostgreSQL are already supported
  • Check Rules: 27 check rules such as built-in null value check, non-null check, enumeration check, etc.
  • Job Execution Engine: Two execution engines Spark and Local have been supported. The Spark engine currently only supports the Spark2.4 version, and the Local engine is a local execution engine developed based on JDBC, without relying on other execution engines.
  • Alert Channel: Supported Email
  • Error Data Storage: MySQL and local files are already supported (only Local execution engine is supported)
  • Registry: Already supports MySQL, PostgreSQL and ZooKeeper

Multiple Execute Modes

  • Provide Web page to configure check jobs, run jobs, view job execution logs, view error data and check results
  • Support online generation job running scripts, submit jobs through datavines-submit.sh, can be used in conjunction with the scheduling system

作业脚本

Easy Deployment & High Availability

  • Less platform dependency, easy to deploy
  • Minimal only rely on MySQL to start the project and complete the check of data quality operations
  • Support horizontal expansion, automatic fault tolerance
  • Decentralized design, Server node supports horizontal expansion to improve performance
  • Job Automatic Fault Tolerance, to ensure that jobs are not lost or repeated

Environmental Dependency

  1. java runtime environment: jdk8
  2. If the data volume is small, or the goal is merely for functional verification, you can use JDBC engine
  3. If you want to run DataVines based on Spark, you need to ensure that your server has spark installed

Quick Start

Click Document for more information

Development

Click Document for more information

Contribution

PRs Welcome

You can submit any ideas as pull requests or as GitHub issues.

If you're new to posting issues, we ask that you read How To Ask Questions The Smart Way (This guide does not provide actual support services for this project!), How to Report Bugs Effectively prior to posting. Well written bug reports help us help you!

Thank you to all the people who already contributed to Datavines!

contrib graph

License

Datavines is licensed under the Apache License 2.0. Datavines relies on some third-party components, and their open source protocols are also Apache License 2.0 or compatible with Apache License 2.0. In addition, Datavines also directly references or modifies some codes in Apache DolphinScheduler, SeaTunnel and Dubbo, all of which are Apache License 2.0. Thanks for contributions to these projects.

Social Media

  • WeChat Official Account (in Chinese, scan the QR code to follow)

wx-qrcode

datavines's People

Contributors

zixi0825 avatar xxzuo avatar an-shi-chi-fan avatar vines0825 avatar mk-site avatar winghv avatar yangyunxi avatar shlpeng avatar tgspace avatar zhugezifang avatar qiuxiuling avatar fuchanghai avatar jfly0902 avatar lzzh1005 avatar meitianjinbu avatar gooch0922 avatar michealstranger avatar myiyang avatar lixuey avatar alldatafounder avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.