Git Product home page Git Product logo

php-etl's Introduction

Allons-y 👋

  • I am Oliver de Cramer, a backend PHP developper
  • I am an expert on Magento and Symfony
  • I work for Wide Agency in Switzerland

I work on

PHP ETL

One of my favorite projects that I have been maintaining since 2018 is a php ETL library and various libraries to connect it to Symfony, EasyAdmin, and (soon) to Sylius.

I have worked during my career alot on imports & data transformation and have seen various libraries been used or specific code being made. For most of them slight changes in the data created complex challanges and tests; and often writing tests for these was near impossible because of the complexity.

So after experiencing other libraries and their limitations (grouping, and doing more then one thing with the same data ...)
I ended up coding my own library that I have nearly used on all projects I have worked on. It composed of multiple packages

  • At the core the php library which have very few dependencies PHP ETL
  • If you have a Symfony project you can integrate it easily with the smyfony logger, Dependency Inject etc with the PHP ETL Bundle
    • If your Symfony project uses EasyAdmin you can see execution results & execute new processes from the admin interface with PHP ETL EasyAdmin
    • If you use Sylius you will soon has interfaces as well, work is in progress; but you can still use the symfony bundle.
Comfy Bundle

Comfy for me is a must have Symfony bundle for most websites. It allows admins to have easy to use configuration. It works very well with my PHP ETL bundle and is even more usefull with Sylius for example. As with the ETL it comes in multiple libraries.

  • At the core is the Comfy Bundle
  • If you use EasyAdmin you can see/edit configs in a dedicated interface Comfy EasyAdmin Bundle
  • The Sylius plugin is all done it's a matter of time before publisgint it.

Associative Array Simplified

This is my first php library, it's less usefully as it used to be with the new php operators but if you need to manipulate Associative arrays without knowing exactly the values there are and don't wish to add lot's of conditions you should check this library.

What I would like to work on

  • Integrating PHP-ETL with sylius,
  • Create an easy to start system for PHP-ETL based on php instead of yml config files. This would make an easier entrypoint without loosing the flexibility of the library for the future. Some of the work is here
  • I have worked on an automated end to end testing tool, sadly non public code. I would like to have the opportunity to make it public and improve the interfaces using symfony UX more.

Old Projects I am proud of

eXpansion2

eXpansion is a server controller for the Maniaplanet game. Basically it interacts with the game server and the players in order to enchance the experience of the users but also of the server admins. It uses Symfony to achieve this.

Maniaplanet is a very diverse game where users can create their own "game mods" and we wanted the controller to be able to adapt to new game modes without having to write a ton of code. eXpansion achieves this and much more.

Sadly the game lost most of it's player base which ended up causing me to loose interest as well and eXpansion² was never truelly finalized.

Stats; Because stats are cool

Stat's are cool, but this is only what I commit on github, most of my work is on private repos. Also I write techinical specifications; make quality audits which are of course not on github

Oliverde8's Stats Oliverde8's Languages

php-etl's People

Contributors

anantrp avatar oliverde8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

php-etl's Issues

[DOCS]Reorganise documentation

Documentation is a Mess at the moment,

Examples uses the alpha1 or before syntax of Yamls in most cases, which is close to what needs to be done but incorrect syntax.

The documentation is also messy, there is at moment 2 ways of doing things the documentation should only focus on the yaml.

[Concept]Sub Chain

With the split operation makig a real debut #10 we need a way to have shared operations between execution branches.

This is also necessery when developping chains with multiple common steps.

2 ways to define these:

  1. Add new subOperations section to existing yamls where these can be listed.
  2. Allow a dedicated yaml file to be loaded

Example of definition

subChains:
  mySubChain:1
    op1: {operation: toto}
  mySubChain:2
    op1: {operation: roro}

Example of usage

myOperation:
  operation: subChain

or if the subChain is defined in a different yaml:

myOperation:
  operation: myFile.yaml:subChain

We should be able to share subChains instances, so for example if the subChain writes a file then if configured in share: true and is used multiple times in the same chain all the data will be written in the same file. How to handle StopItem requires thinking.

[DOCS]Add Examples

Current examples are very abstract

we need concrete examples with example csv files, and scripts that anyone that forks the project can execute to understand how it works.

[ItemTypes]Add new ItemType MixItem

This item contains an array of ItemInterfaces. Each Item will be returned individually to the next step.

Attention needs to be split again!

[ItemType]FileWrotenItem

This is a item that all writers will return after the StopItem in order to let know the steps down stream that a file was written.

We will be able to use this in the future to archive files for example, or send wroten files somewhere in a next step.

[ItemType]FileReadItem

This is a specific Item that is sent by file readers in order to let know operations downstream that the read of a whole file was finished.

[OPERATION]Add operation to filter data

At the moment it's possible to execute an operation on some data transiting in the chain.

For example, we read a CSV file containing customers, we wish to only process customers that has subscribed to the newsletter.

filter-subscribed:
  operation: filter
  options:
     rule: {get : {field: 'is_subscribed'}}

in this example we would use the rule engine to check fields. We could also use the symfony expressions but thos are already built in the rule engine. The rule engine gives more flexibility as it will allow to make checks without failing.


Advance use case

We can use this in with the split operation to make different processes for different data; example:

split-base-on-subscription:
  operation: split
  options:
    execution1: 
      filter-subscribed:
        operation: filter
        options:
           rule: {expression_language: {expression: "rowData.is_subscribed"}}
      custom-transformation-subscribed:
        operation: my-subscribed-operation
        options: []
    execution2: 
      filter-not-subscribed:
        operation: filter
        options:
           rule: {expression_language: {expression: "!rowData.is_subscribed"}}
      custom-transformation-notsubscribed:
        operation: my-notsubscribed-operation
        options: []

!Reminder, any step after the split-base-on-subscription will get all the data both subscribed and unsubscribed in it's state before being transformed by the operations in the split!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.