I am develloping a series of data engeneering projects aiming to connect source systems, warehouse platforms and reporting systems like PowerBI, Tableau. There is going to be a great journey. Great thanks to Dmitry from Syrfanalytics who initiated my move to exploring and documenting my journey.
The aim of this project to learn main functionalities of two large and fast system in ELT tools: Snowflake and dbt. I am going to concentrate how to connect these to systems, how to marry them, and how to get main functionalities working
I am following along this course on Udemy: https://www.udemy.com/course/complete-dbt-data-build-tool-bootcamp-zero-to-hero-learn-dbt/?referralCode=659B6722C93EF4096D11 and supplement it with my notes what has worked and what doesn't.
A few words about these systems with some help of chatGpt.
Snowflake is a cloud-based data warehousing platform that allows users to store and analyze large volumes of data in a scalable and cost-effective way. It supports both structured and semi-structured data and is known for its simplicity and ease of use
dbt - or "data build tool" - is an open-source software tool that helps data analysts and engineers transform data in their warehouse more effectively. It is particularly popular in the context of data analytics and data warehousing. dbt focuses on the transformation step of the data analytics workflow, allowing users to define transformations on raw data within their data warehouse.
I will widely apply Python and SQL in my development as well as use AWS S3 to store data.
Python - is a high-level, interpreted programming language known for its readability, versatility, and ease of learning. It was created by Guido van Rossum and first released in 1991. Python has become one of the most popular programming languages, widely used in automation, data engineering, data science, artificial intelligence.
SQL - Structured Query Language - is a standard programming language used for managing and manipulating relational databases. It provides a standardized way to interact with databases, enabling users to perform tasks such as querying data, updating data, inserting data, and deleting data
**Amazon S3 **- Amazon Simple Storage Service - is a scalable cloud storage service provided by Amazon Web Services (AWS). It is designed to store and retrieve any amount of data. S3 is often used for backup, archiving, content distribution, and as a data storage backend for various applications.