Git Product home page Git Product logo

serverless-spark-workshop's Introduction

Serverless Spark Hands-On Workshop

Apache Spark is often used for interactive queries, machine learning, and real-time workloads.

Spark developers are typically spending only 40% of time writing code while spending 60% tuning infrastructure and managing clusters.

Google Cloud customers have used our auto-scaling, serverless Spark to boost productivity and reduce infrastructure costs.

This repository contains Serverless Spark on GCP hands-on labs built around common use cases. By doing these labs, data engineers and data scientists with Apache Spark experience will ramp up faster on Serverless Spark on GCP.

Check out this repository for Dataproc Serverless ready-to-use, config driven Spark templates for solving simple, but large, in-Cloud data tasks, including data import/export/backup/restore and bulk API operations.

Feedback From Serverless Spark Users

  • "Serverless Spark is so much easier than traditional cluster based products."
    ~ Director of Data Science at business management corporation

  • "Anytime we can go the serverless route we will. Just so much simpler and eliminates the management of the infrastructure."
    ~ Director of Data Engineering at business management corporation

  • “Serverless Spark enables us to only use the compute resources we need when we need them and all with a single click. The Spark Workshop is a great way to get hands on experience with the tools.”
    ~ Principal Data Scientist at multinational retail corporation

  • “We ran a compute-intensive Serverless Spark query in 19 mins. That same Spark query took 90 mins on a traditional cluster based product. It's ~80% faster on Serverless Spark.”
    ~ Principal Architect at multinational retail corporation

What's Covered?

# Modules Focus Feature
1 Lab 1 - Cell Tower Anomaly Detection Data Engineering Serverless Spark Batch from CLI with Cloud Composer orchestration
2 Lab 2 - Wikipedia Page View Analysis Data Analysis Serverless Spark Batch from BigQuery UI
3 Lab 3 - Chicago Crimes Analysis Data Analysis Serverless Spark Interactive from Vertex AI managed notebook
4 Lab 4 - Retail Store Analytics Data Analysis Serverless Spark Batch from CLI with Cloud Composer orchestration and Dataproc Metastore
5 Lab 5 - Serverless Spark Streaming Data Analysis Serverless Spark Dataproc Batches
6 Lab 6 - Timeseries Forecasting Data Analysis Vertex AI notebooks with Serverless Spark session
7 Lab 7 - COVID-19 Economic Impact Data Analysis Vertex AI notebooks with Serverless Spark session
8 Lab 8 - Malware Detection Data Analysis Serverless Spark Batch from CLI with Cloud Composer orchestration
9 Lab 9 - Social Media Data Analytics Data Analysis Vertex AI notebooks with Serverless Spark session

Credits

Some of the labs are contributed by Google Cloud partners or by Googlers.
Lab 1 - TEKsystems
Lab 2 - TEKsystems
Lab 3 - Anagha Khanolkar (@anagha-google)
Lab 4 - TEKsystems
Lab 5 - TEKsystems
Lab 6 - TEKsystems
Lab 7 - TEKsystems
Lab 8 - TEKsystems
Lab 9 - TEKsystems

Contributing

See the contributing instructions to start contributing.

License

All solutions within this repository are provided under the Apache 2.0 license. Please see the LICENSE file for more detailed terms and conditions.

Disclaimer

This repository and its contents are not an official Google Product.

Contact

Interested in doing a guided, hands-on Spark Workshop? Please fill out this form.

serverless-spark-workshop's People

Contributors

kmting avatar anagha-google avatar

Stargazers

Roman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.