Git Product home page Git Product logo

cloud-dataproc's Introduction

Google Cloud Dataproc

This repository contains code and documentation for use with Google Cloud Dataproc.

Samples in this Repository

  • codelabs/opencv-haarcascade provides the source code for the OpenCV Dataproc Codelab, which demonstrates a Spark job that adds facial detection to a set of images.
  • codelabs/spark-bigquery provides the source code for the PySpark for Preprocessing BigQuery Data Codelab, which demonstrates using PySpark on Cloud Dataproc to process data from BigQuery.
  • codelabs/spark-nlp provides the source code for the PySpark for Natural Language Processing Codelab, which demonstrates using spark-nlp library for Natural Language Processing.
  • notebooks/python provides example Jupyter notebooks to demonstrate using PySpark with the BigQuery Storage Connector and the Spark GCS Connector
  • spark-tensorflow provides an example of using Spark as a preprocessing toolchain for Tensorflow jobs. Optionally, it demonstrates the spark-tensorflow-connector to convert CSV files to TFRecords.
  • spark-translate provides a simple demo Spark application that translates words using Google's Translation API and running on Cloud Dataproc.

See each directories README for more information.

Additional Dataproc Repositories

You can find more Dataproc resources in these github repositories:

Dataproc projects

Connectors

Kubernetes Operators

Examples

For more information

For more information, review the Dataproc documentation. You can also pose questions to the Stack Overflow community with the tag google-cloud-dataproc. See our other Google Cloud Platform github repos for sample applications and scaffolding for other frameworks and use cases.

Contributing changes

Licensing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.