Git Product home page Git Product logo

bodo-examples's Introduction

Let's Learn Bodo through Examples!

Welcome to the Bodo Examples Repo! This is where you can find examples to help you get started using Bodo.

Bodo is the next generation big-data processing engine that brings supercomputing-style performance and scalability to native Python and SQL codes automatically. Bodo has several advantages over other big data transformation systems that makes it one of the most performant and cost-effective solutions for large scale data analytics, particularly ETL and ELT.

This repository teaches you to use Bodo effectively through examples. If you know SQL and Python, you already know how to use Bodo, and you don't need any new language or API. You will just import bodo and learn some programming tricks to improve your existing applications to save $$$ on compute resources while delivering value in a much shorter time-frame. Benchmarks have shown that Bodo can be orders of magnitude faster than its competitors like Spark.

How to run these examples?

We recommend that you run these examples on the Bodo Platform. You can sign up to our platform to try it out. Some examples like modules 1 to 3 can run on small clusters, e.g., 2 nodes of c5.2xlarge with total of 8 physical cores (16 vCPU) and 32GB RAM, and some examples need larger clusters. The description provided with each example indicates the size of cluster that is required to run it.

You can also run these examples locally by installing bodo on your laptop. However, we recommend using the Bodo Platform for the best experience as it provides a notebook environment with all the code available and required packages already installed for you.

What if I wanted to test my code with my data?

If you wanted to run your application codes with your own data, please refer to the instructions here on how to set up the identity access management, policies, and credentials to integrate your cloud provider with bodo platform. This allows bodo to spin up EC2 instances, create a cluster, and enable you to access your data within your VPC. Everything, including your data stays in your VPC.

Modules outline

Modules 1 and 2 focus on compute heavy data transformations through ETL applications. You will find examples with operational databases like PostgreSQL, Oracle, MySQL in module 01, a data warehouse like Snowflake in module 02, and a data Lakehouse example with Iceberg in module 03.

Modules 04 and 05 contain larger scale examples with Machine Learning, Business use cases (financial, transportation, etc.). Finally, module 06 contains a performance comparison of Bodo vs Spark on a set of queries derived from the TPC-H benchmark suite.

This is an open-source repository, so please consider adding your Bodo examples to it! You can contribute by creating a feature branch and submit a pull request for us to review.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.