Git Product home page Git Product logo

sbt-spark-package's Introduction

sbt-spark-package Build Status

Sbt Plugin for Spark Packages

sbt-spark-package is a Sbt plugin that aims to simplify the use and development of Spark Packages.

Please upgrade to version 0.2.4+ as spark-packages now supports SSL.

Requirements

  • sbt

Setup

The sbt way

Simply add the following to <your_project>/project/plugins.sbt:

  resolvers += "bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven/"

  addSbtPlugin("org.spark-packages" % "sbt-spark-package" % "0.2.6")

Usage

Spark Package Developers

In your build.sbt file include the appropriate values for:

  • spName := "organization/my-awesome-spark-package" // the name of your Spark Package

Please specify any Spark dependencies using sparkVersion and sparkComponents. For example:

  • sparkVersion := "2.1.0" // the Spark Version your package depends on.

Spark Core will be included by default if no value for sparkComponents is supplied. You can add sparkComponents as:

  • sparkComponents += "mllib" // creates a dependency on spark-mllib.

or

  • sparkComponents ++= Seq("streaming", "sql")

You can make a zip archive ready for a release on the Spark Packages website by simply calling sbt spDist. This command will include any python files related to your package in the jar inside this archive. When this jar is added to your PYTHONPATH, you will be able to use your Python files.

By default, the zip file will be produced in <project>/target, but you can override this by providing a value for spDistDirectory like:

spDistDirectory := "Users" / "foo" / "Documents" / "bar"

The slashes should still remain as slashes on a Windows system, don't switch them to backslashes.

You may publish your package locally for testing with sbt spPublishLocal.

In addition, sbt console will create you a Spark Context for testing your code like the spark-shell.

If you want to make a release of your package against multiple Scala versions (e.g. 2.10, 2.11), you may set spAppendScalaVersion := true in your build file.

In any case where you really can't specify Spark dependencies using sparkComponents (e.g. you have exclusion rules) and configure them as provided (e.g. standalone jar for a demo), you may use spIgnoreProvided := true to properly use the assembly plugin.

Including shaded dependencies

Sometimes you may require shading for your package to work in certain environments. sbt-spark-package supports publishing shaded dependencies built through the sbt-assembly plugin. To achieve this, you will need two projects, one for building the shaded dependency, and one for building the distribution ready package.

lazy val shaded = Project("shaded", file(".")).settings(
  libraryDependencies ++= (dependenciesToShade ++
    nonShadedDependencies.map(_ % "provided")), // don't include any other dependency in your assembly jar
  target := target.value / "shaded", // have a separate target directory to make sbt happy
  assemblyShadeRules in assembly := Seq(
    ShadeRule.rename("blah.**" -> "bleh.@1").inAll
  )
) // add all other settings

lazy val distribute = Project("distribution", file(".")).settings(
  spName := ... // your spark package name
  target := target.value / "distribution",
  spShade := true, // THIS IS THE MOST IMPORTANT SETTING
  assembly in spPackage := (assembly in shaded).value, // this will pick up the shaded jar for distribution
  libraryDependencies := nonShadedDependencies // have all your non shaded dependencies here so that we can
                                               // generate a clean pom.
) // add all other settings

Now you may use distribution/spDist to build your zip file, or distribution/spPublish to publish a new release. For more details on publishing, please refer to the next section.

Registering and publishing Spark Packages

credentials

In order to use spRegister or spPublish to register or publish a release of your Spark Package, you have to specify your Github credentials. You may specify your credentials through a file (recommended) or directly in your build file like below:

credentials += Credentials(Path.userHome / ".ivy2" / ".sbtcredentials") // A file containing credentials

credentials += Credentials("Spark Packages Realm",
                           "spark-packages.org",
                           s"$GITHUB_USERNAME",
                           s"GITHUB_PERSONAL_ACCESS_TOKEN")

More can be found in the sbt documentation.

Using these functions require "read:org" Github access to authenticate ownership of the repo. Documentation to generate a Github Personal Access Token can be found here.

spRegister

You can register your Spark Package for the first time using this plugin with the command sbt spRegister. In order to register your package, you must have logged in to the Spark Packages website at least once and supply values for the following settings in your build file:

spShortDescription := "My awesome Spark Package" // Your one line description of your package

spDescription := """My long description.
                    |Could be multiple lines long.
                    | - My package can do this,
                    | - My package can do that.""".stripMargin

credentials += // Your credentials, see above.

The homepage of your package is by default the web page for the Github repository. You can change the default homepage by using:

spHomepage := // Set this if you want to specify a web page other than your github repository.

spPublish

You can publish a new release using sbt spPublish. The HEAD commit on your local repository will be used as the git commit sha for your release. Therefore, please make sure that your local commit is indeed the version you would like to make a release for, and that you have pushed that commit to the master branch on your remote.

The required settings for spPublish are:

// You must have an Open Source License. Some common licenses can be found in: http://opensource.org/licenses
licenses += "Apache-2.0" -> url("http://opensource.org/licenses/Apache-2.0")

// If you published your package to Maven Central for this release (must be done prior to spPublish)
spIncludeMaven := true

credentials += // Your credentials, see above.

Spark Package Users

Any Spark Packages your package depends on can be added as:

  • spDependencies += "databricks/spark-avro:0.1" // format is spark-package-name:version

We also recommend that you use sparkVersion and sparkComponents to manage your Spark dependencies. In addition, you can use sbt assembly to create an uber jar of your project.

Contributions

If you encounter bugs or want to contribute, feel free to submit an issue or pull request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.