Git Product home page Git Product logo

trembita's People

Contributors

vitaliihonta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trembita's Issues

Benchmarks

Need to add JMH benchmarks for:

  • kernel (Basic, FSM, QL)
  • akka integrations (trembita vs plain akka)
  • spark integrations (trenbita vs plain spark)
  • caching

Research ability to integrate Kafka Streams

Trembita currently is able to read data from kafka using Akka Stearms or Spark Streaming.
Need to research the ability to integrate Kafka streams directly so that users can use trembita without akka or spark for such cases

Monitoring capabilities

Allow trembita integration with grafana and prometheus for monitoring pipeline performance and visualisation of pipeline itself

trembitaz

Integrate scalaz-zio as a separete module.
Research about pefrormance pros/cons when using scala-zio bifunctor IO

Integration with java.util.stream

java.util.stream are well known and frequently used in Java projects.
They have good performance so I think we should add the ability to use them as transport layer along with Vector and ParVector for sequential and parallel pipelines

Cons Environment

Allow programmer to write same code for different possible environments.
For instance:

val pipeline: DataPipelineT[F, A, Sequential] = ???
val pipeline2: DataPipelineT[F, A, Akka[NotUsed] Or Spark]
  .to[Akka[NotUsed]]
  .orTo[Spark](condition = ???)
  . // some possible heavy operations

This should allow more flexible applicaitons which can be run on different environments depending on some condition (for isntance amount of your data).
Additionaly such implicit derivation should work:

def func[E <: Environment](implicit ev: E Supports CanGroupBy) = ???
func[Akka[NotUsed] Or Spark]

And this shouldn't

def func2[E <: Environment](implicit ev: E Supports CanSort) = ???
func[Akka[NotUsed] Or Spark] // doesn't compile

Compile time pipeline optimizer

Need to implement pipeline transformations optimizer that should work at compile time.
For instance, we can start from trembita.ql. Spark's query analyzer is a good option to research

Enrich trembitaql

Add the following methods (like in sparksql):

  • withColumn:
case class Foo(i: Int, x: Long, s: String)
case class Bar(ij: Long, ss: String)
val pipeline: DataPipelineT[F, Bar, E] = ???
pipeline.withColumn[Foo](_ / 2)
  • select:
case class Foo(i: Int, x: Long, s: String)
case class Bar(ij: Long, ss: String)
val pipeline: DataPipelineT[F, Foo, E] = ???
pipeline.select[Bar](a => a.i + a.j, _.s * 2)

Integrate akka http

Think about how to integrate Akka Http routes into trembita.
Implement after [#18] is done

Trembita QL improvements

currently querying pipeline looks like:

val pipeline: DataPipelineT[F, A, E] = ???
pipeline.query(_
  .groupBy(...)
  .aggregate(...)
)

What about to make it less verbose?

val pipeline: DataPipelineT[F, A, E] = ???
val result: DataPipelineT[F, Foo, E] = pipeline.
  .groupBy(...) // something like DataPipelineTGroupByClause
  .aggregate(...)  // DataPipelineTAggregateClause
  .having(...)  // DataPipelineTHavingClause

Where:

  • DataPipelineTGroupByClause - special class providing aggregate operation
  • DataPipelineTAggregateClause - special class providing having & order operation
  • DataPipelineTHavingClause - special class providing more having & order operation

trembita.ql package should contain implicit conversions from ...Clause stuff into DataPipelineT.

Deploy Pipelines

Sometimes in distributed systems, there is a need to deploy components as a result of some user interactions.
For instance, deploy spark cluster for distributive computations when a user wants so.
Need to research about something I'll call "Deploy pipelines".
The idea is simple: write an OutputT which will (based on input pipeline elements) deploy something
(for instance docker containers, k8s pods, spark clusters, etc.)

Machile learning capabilities

Need to research about how to integrate machine learning capabilities into trembita.
The easiest way is to research Spark-ML and try to integrate it.
Then we need to research tensorflow integration.
Having these 2 models - create separate module trembita-ml as a kernel with basic abstractions.
Then integrate trembita-ml with Spark ML and tensorflow scala

Supports DSL

I've recently added the following type alias into kernel:

type Supports[E <: Environment, Op[_[_]]] = Op[E#Repr]

Currently it allows to abstract over environment you are using:

def foo[F[_], A, E <: Environment](foo: Foo)(
  implicit canCombineByKey: E Supports CanCombineByKey,
  canReduce: E Supports CanReduce,
  hasSize: E Supports HasSize
  ...
) = ???

Need to provide an easier way to do so. For instance something like:

type RequiredAPI[E <: Environment] = 
  Supports[E, 
    CanCombineByKey & 
    CanReduce & 
    HasSize & ...
]
def foo[F[_], A, E <: Environment: RequiredAPI](foo: Foo) = ???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.