Data Engineering and DataOps Course: IDS 706, Fall 2023 by Noah Gift
- Contact & Bio
- Co-instructors: Alfredo Deza and Derek Wales
- Course Syllabus
- Course Projects
- Week by Week Schedule
- Resources
- SQL Mastery Guide
- Question and Answer
- [Guest Speakers]
- Lecture Notes
- Community Recipes/Example Student Projects
- Coursera Course: Foundations of Data Engineering
- Office Hours Via Zoom: After class and Zoom
Data Engineering is applied software engineering. It is not data science or computer science. As a result, this course is focused on building software systems within the domain of data. This course covers servers as a method of encapsulating many courses in the program. Students learn to apply Data Engineering to a real-world project. This manifests itself through several goals: development of non-linear life-long learning, community building, portfolio development, and software engineering best practices including using AI Pair Programming assistances, DevOps, and Cloud Computing.
Upon course completion, you'll be able to:
- Create data engineering solutions using Rust and the Linux environment.
- Design binary executables for interfacing with SQL systems such as Snowflake, DataBricks, and BigQuery.
- Construct robust, efficient, and safe systems with low carbon footprints, leveraging the inherent properties of Rust for scalable efficiency.
- Use AI Pair Programming tools like GitHub Copilot, ChatGPT, AWS CodeWhisperer, and Google Bard for building sophisticated, reliable systems.
- Cultivate non-linear, lifelong learning skills.
- Assemble, share, and present persuasive portfolios using platforms like GitHub, YouTube, and LinkedIn.
- Obtain certifications in Cloud Computing and a Big Data SQL solution.
Basic programming skills as well as basic Linux skills. See optional readings/media to self-learn before class starts. You will also be required to do a 5-week Rust bootcamp.
Expect to spend between 10-20 hours per week in this class including the five-week bootcamp. This class is a required class and teaches material that prepares you for a job doing software engineering tasks in the field of data and machine learning. It is challenging and time-intensive, so please plan Fall schedule accordingly. The reason this class uses weekly demos is that they are common in the software industry and this class prepares you to hit the ground running in a high-pressure demanding tech job. Additionally, by doing doing demos you increase your metacognition ability, i.e. you learn what you know and what do you don't know. Increasing your metacognition skills is a shortcut to mastery in real-world software engineering.
Finally, at the end of class you will have 5 substantial projects, and 15 mini-projects. This means you will have a robust portfolio of work to share with a future employer. This amount of work we do in this class is very similar to a real-world job doing software engineering, but you have guard rails of tremendous support from the faculty and TAs at a world class institution.
Answers on why this course uses Rust, GitHub Codespaces and Copilot:
- Sustainability
- What can I do to get the most out of this class?
- Level up with Rust via GitHub Copilot
- Teaching MLOps at Scale (GitHub Universe 2022)
We, as educators and students, are dedicated to fostering diversity and equity, ensuring everyone's full participation by eliminating educational obstacles. This course values the diverse experiences, backgrounds, identities, learning styles, and academic interests of each individual. The array of perspectives from our students enriches all, and we aim to approach each with openness and respect.
The primary resources for this course are the following Coursera Specializations by Noah Gift:
- Building Cloud Computing Solutions at Scale Specialization
- Python, Bash, and SQL Essentials for Data Engineering Specialization
- [MLOPs (Machine Learning Operations) Specialization] (Link to be provided)
- [Applied Data Engineering Specialization] (Link to be provided)
- [Systems Programming in Rust Specialization] (Link to be provided)
These specializations provide comprehensive coverage of the key concepts and skills needed for this course. They include a combination of video lectures, readings, quizzes, and hands-on projects to reinforce learning and build practical skills.
This course will involve a number of different types of interactions. These interactions will take place primarily through Microsoft Teams, GitHub, and Zoom. Please take the time to navigate through the course and become familiar with the course syllabus, structure, and content and review the list of resources below.
Students in an online program should be able to do the following:
- Communicate via Teams discussion forums.
- Use web browsers and navigate the World Wide Web and use tools like ChatGPT.
- Use the learning management system Teams.
- Use GitHub.
- Create demo videos.
- Write Rust code.
- Use Cloud Computing and Cloud Computing Labs.
The course is structured into several key sections, each designed to provide you with the skills and knowledge necessary to excel in the field of data engineering. The sections are as follows:
- Introduction to Data Engineering and DataOps
- Rust Programming for Data Engineering
- SQL Systems and Data Engineering
- Cloud Computing and Data Engineering
- AI Pair Programming and Data Engineering
- DevOps and Data Engineering
- Final Project and Course Review
Throughout the course, you will engage in hands-on projects, both individually and in groups, that will reinforce the concepts covered in the lectures and readings. These projects will also provide you with valuable experience in designing and implementing data engineering solutions.
Data engineering is a rapidly growing field that plays a crucial role in the modern data-driven world. By the end of this course, you will have gained a solid foundation in data engineering principles and practices, as well as the ability to apply these skills to real-world problems. Whether you are looking to advance your career in data engineering or simply want to broaden your understanding of this important field, this course will provide you with the tools and knowledge you need to succeed.