This repository serves as a comprehensive guide for individuals aspiring to become Big Data Engineers. It provides a detailed roadmap, recommended learning resources, and a collection of open-source projects to help you develop the necessary skills and gain hands-on experience.
Some of the resource links are not working, I am trying to update them. [If you want to add create a pull request]
- Introduction
- Programming Languages
- Data Processing Frameworks
- Data Storage and Querying
- Data Streaming and Messaging
- Data Orchestration and Workflow Management
- Cloud Computing
- Data Modeling and ETL/ELT
- Data Visualization and Reporting
- Soft Skills
- Projects and Certifications
- Interview Preparation
- Contributing
This section will provide an overview of the Big Data Engineering field, its importance, and the role of a Big Data Engineer.
- Python for Data Analysis - Free book from O'Reilly covering Python for data analysis.
- Python Data Science Handbook - Free book that covers the essential knowledge for working with data in Python.
- Python for Data Analysis Video Series - Video tutorials from Corey Schafer.
- Scala Programming Language - Official documentation and learning resources for Scala.
- Java Programming Language - Official Java tutorials from Oracle.
- Apache Spark Official Documentation
- Learning Spark: Lightning-Fast Data Analytics - Free book from Databricks (requires email signup).
- Spark Programming Guide - Book from O'Reilly.
- Apache Hadoop Official Documentation
- Hadoop: The Definitive Guide - Book from O'Reilly.
- PostgreSQL Tutorial - Free comprehensive PostgreSQL tutorial.
- MySQL Tutorial - Free MySQL tutorial for beginners.
- MongoDB University - Free online courses and certifications for MongoDB.
- Apache Cassandra Documentation - Official documentation for Apache Cassandra.
- HBase Reference Guide - Official reference guide for Apache HBase.
- Apache Hive Tutorial - Official Apache Hive tutorial.
- Presto Documentation - Official documentation for Presto.
- Apache Impala Documentation - Official documentation for Apache Impala.
- Apache Kafka Documentation - Official Apache Kafka documentation.
- Kafka: The Definitive Guide - Book from O'Reilly.
- Kafka Streams Documentation - Official documentation for Kafka Streams.
- Apache Flink Documentation - Official Apache Flink documentation.
- Apache Storm Documentation - Official Apache Storm documentation.
- Apache Airflow Documentation - Official Apache Airflow documentation.
- Airflow Tutorial - Official Airflow tutorial.
- Mastering Apache Airflow - Book from O'Reilly.
- AWS Big Data Services - Overview of AWS big data services.
- Amazon EMR Documentation - Official documentation for Amazon EMR.
- Amazon S3 Documentation - Official documentation for Amazon S3.
- Amazon Athena Documentation - Official documentation for Amazon Athena.
- Amazon Redshift Documentation - Official documentation for Amazon Redshift.
- Azure Data Services - Overview of Azure data and analytics services.
- Azure HDInsight Documentation - Official documentation for Azure HDInsight.
- Azure Data Lake Storage Documentation - Official documentation for Azure Data Lake Storage.
- Azure Synapse Analytics Documentation - Official documentation for Azure Synapse Analytics.
- Google Cloud Data Services - Overview of Google Cloud data and analytics services.
- Google Cloud Dataproc Documentation - Official documentation for Google Cloud Dataproc.
- Google Cloud Dataflow Documentation - Official documentation for Google Cloud Dataflow.
- Google BigQuery Documentation - Official documentation for Google BigQuery.
- Data Modeling for Data Warehouses - Book by Len Silverston and Paul Agnew.
- Data Vault Modeling Guide - Book from O'Reilly.
- ETL/ELT with Python - Book from O'Reilly.
- Apache NiFi Documentation - Official documentation for Apache NiFi.
- Talend Open Studio Documentation - Official documentation for Talend Open Studio.
- Tableau Desktop Resources - Free training resources for Tableau Desktop.
- Power BI Documentation - Official documentation for Microsoft Power BI.
- Apache Superset Documentation - Official documentation for Apache Superset.
- Problem-Solving Techniques - Resources for developing problem-solving skills.
- Effective Communication Skills - Resources for improving communication skills.
- Collaboration and Teamwork - Resources for enhancing collaboration and teamwork.
- AWS Certified Big Data Specialty - AWS Certified Big Data Specialty certification.
- Azure Data Engineer Associate - Azure Data Engineer Associate certification.
- Google Cloud Professional Data Engineer - Google Cloud Professional Data Engineer certification.
- Data Engineering Interview Questions - Collection of data engineering interview questions.
- System Design Interview Questions - System design interview questions and resources.
If you have any suggestions, improvements, or additional resources to share, please feel free to contribute to this repository. Follow the Contributing Guidelines to get started.