Git Product home page Git Product logo

net.jgp.labs.spark's Introduction

Some Java examples for Apache Spark

Welcome to this project I started several years ago with this simple idea: let's use Spark with Java and not learn all those complex stuff like Hadoop or Scala. I am not that smart anyway...

This project is evolving in a book, creatively named "Spark in Action, 2nd edition, with Java" published by Manning.

If you want to know more, and be guided through your Java and Spark learning process, I can only recommend to read the book at Manning: Spark with Java. Find out more about Spark in Action, 2nd edition, on the Manning website. The book contains more examples, more explanation, is professionally written and edited.

Here are the repos with the book examples:

Chapter 1 Introduction to Spark and deals with basic ingestion examples.

Chapter 2 Mental model around Spark.

Chapter 3 The majestic role of the dataframe.

Chapter 4 All about laziness.

Chapter 5 WIP

Chapter 6 WIP

Chapter 7 Ingestion from files.

Chapter 8 Ingestion from databases.

Chapter 9 Ingestion from anything, right?

Chapter 10 WIP

Chapter 11 WIP

Chapter 12 WIP

Chapter 13 WIP

Chapter 14 WIP

Chapter 15 WIP

Chapter 16 WIP

Chapter 17 WIP

Chapter 18 WIP

Chapter 19 WIP

Chapter 20 WIP

In the meanwhile, this project is still live, with more raw-level examples, that may (or may not) work.

Environment

These labs rely on:

  • Apache Spark 2.4.0 (based on Scala 2.11)
  • Java 8

Notes on Branches

The master branch will always contain the latest version of Spark, currently v2.4.0.

Labs

A few labs around Apache Spark, exclusively in Java.

Organization is now in sub packages:

  • l000_ingestion: Data ingestion from various sources.
  • l020_streaming: Data ingestion via streaming. Special note on Streaming.
  • l050_connection: Connect to Spark.
  • l100_checkpoint: Checkpoint introduced in v2.1.0.
  • l150_udf: UDF (User Defined Functions).
  • l200_join: added join examples.
  • l250_map: map (in the context of mapping, not always linked to map/reduce).
  • l300_reduce: reduce.
  • l400_industry_formats: working with industry formats, limited, for now, to HL7 and FHIR.
  • l500_misc: other examples.
  • l600_ml: ML (Machine Learning).
  • l700_save: saving your results.
  • l800_concurrency: labs around concurrency access, work in progress.
  • l900_analytics: More complex examples of using Spark for Analytics.

If you would like to see more labs, send your request to [email protected] or @jgperrin on Twitter.

net.jgp.labs.spark's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.