Git Product home page Git Product logo

cdap-spark's Introduction

CDAP Spark

CDAP Spark is an all-in-one library for unified plug & play. CDAP Spark is sitting on the shoulders of Apache Spark, which now is the big data platform of choice for enterprises.

CDAP Spark covers all flavors of modern data analytics from deep learning, machine learning to busines rule and query analysis up to comprehensive text & time series processing.

Works DL Works ML Works TS
Works Rules Works SQL Works Text

CDAP Spark externalizes modern data analytics in form of plugins for Google CDAP data pipelines, and boosts the work of data analysts and scientists to build data driven applications without coding.

Externalization is an appropriate means to make advanced analytics reusable, transparent and notably secures the knowledge how enterprise data are transformed into insights, foresights and knowledge.

We decided to select Google's CDAP as this unified environment was designed to cover all aspects of corporate data processing, from data integration & ingestion to SQL & business rules up to machine learning & deep learning.

CDAP Spark offers more than 150 analytics plugins for CDAP based pipelines and provides the world's largest collection of visual analytics components.

Overview

Visual Analytics is supported by the following modules:

Module Description
DL Externalizes deep learning algorithms (adapted from Intel's Analytics Zoo) as plugins for Google CDAP data pipelines.
ML Externalizes Apache Spark ML machine learning algorithms as Google CDAP data pipelines.
TS Completes Apache Spark with proven time series algorithms and also externalizes them as plugins for Google CDAP data pipelines.
Rules Externalizes Drools' Rule Engine as plugin for CDAP data pipelines.
SQL Supports the application of Apache Spark compliant SQL queries for CDAP batch and stream pipelines.
Text Integrates John Snow Lab's excellent Spark NLP library with Google CDAP and offers approved NLP features as plugins for CDAP data pipelines.

Background

Interested in more detailed information? Read here

cdap-spark's People

Contributors

predictiveworks avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

cdap-spark's Issues

Cannot compile

I am unable to compile the works-core component using Maven (3.6.2)

  • The java class de/kp/works/core/ml/RFRegressorConfig.java contains references to the old CDAP (co.cask.cdap instead of io.cdap.cdap)
  • That same class inherits de.kp.works.core.BaseRegressorConfig which does not exist in master.

Works-TS doesn't compile (TSUtils is missing)
Works-text doesn't compile - has many references to old CDAP (co.cask.cdap instead of io.cdap.cdap)

There may be other issues. It looks like a clean build from a git pull is not possible. Can you help ?
I am looking for a Spark NLP compatible plugin for CDAP 6.2.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.