Git Product home page Git Product logo

spark-power-bi's Introduction

build

spark-power-bi

A library for pushing data from Spark, SparkSQL, or Spark Streaming to Power BI.

Requirements

This library is supported on Apache Spark 1.4 and 1.5. The versions of the library match to the Spark version. So v1.4.0_0.0.7 is for Apache Spark 1.4 and v1.5.0_0.0.7 is for Apache Spark 1.5.

Power BI API

Additional details regarding the Power BI API are available in the developer center. Authentication is handled via OAuth2 with your Azure AD credentials specified via Spark properties. More details on registering an app and authenticating are available in the Power BI dev center. When pushing rows to Power BI the library will create the target dataset with table if necessary. The current Power BI service is limited to 10,000 rows per call so the library handles batching internally. The service also limits to no more than 5 concurrent calls at a time when adding rows. This is handled by the library using coalesce and can be tuned by with the spark.powerbi.max_partitions property.

Scaladoc

Scaladoc is available here

Configuration

A few of the key properties are related to OAuth2. These depend on your application's registration in Azure AD.

spark.powerbi.username
spark.powerbi.password
spark.powerbi.clientid

Rather than using your personal AD credentials for publishing data, you may want to create a service account instead. Then you can logon to Power BI using that account and share the data sets and dashboards with other users in your organization. Unfortunately, there's currently no other way of authenticating to Power BI. Hopefully in the future there'll be an organization-level API token that can publish shared data sets, without having to use an actual AD account. You can also use a Power BI group when publishing data.

Setting Up Azure Active Directory

You'll need to create an application within your Azure AD in order to have a client id to publish data sets.

  1. Using the Azure management portal, open up your directory and add a new Application (under the Apps tab)
  2. Select "Add an application my organization is developing" step0
  3. Enter any name you want and select "Native Client Application" step1
  4. Enter a redirect URI, this can be anything since it won't be used step2
  5. Once the app has been added you need to grant permissions, click "Add application" step3
  6. Select the "Power BI Service" step4
  7. Add all 3 of the delegated permissions step5
  8. Save your changes and use the newly assigned client id for the spark.powerbi.clientid property step6

Spark Core

import com.granturing.spark.powerbi._

case class Person(name: String, age: Int)

val input = sc.textFile("examples/src/main/resources/people.txt")
val people = input.map(_.split(",")).map(l => Person(l(0), l(1).trim.toInt))

people.saveToPowerBI("Test", "People")

SparkSQL

import com.granturing.spark.powerbi._
import org.apache.spark.sql._

val sqlCtx = new SQLContext(sc)
val people = sqlCtx.jsonFile("examples/src/main/resources/people.json")

people.write.format("com.granturing.spark.powerbi").options(Map("dataset" -> "Test", "table" -> "People")).save

Spark Streaming

val sc = new SparkContext(new SparkConf())
val ssc = new StreamingContext(sc, Seconds(5))

val filters = args

val input = TwitterUtils.createStream(ssc, None, filters)

val tweets = input.map(t => Tweet(t.getId, t.getCreatedAt, t.getText, t.getUser.getScreenName))
val hashTags = input.flatMap(t => t.getHashtagEntities.map(h => HashTag(t.getId, h.getText, t.getUser.getScreenName)))

tweets.saveToPowerBI(dataset, "Tweets")
hashTags.saveToPowerBI(dataset, "HashTags")

ssc.start()
ssc.awaitTermination()

Referencing As A Dependency

You can also easily reference dependencies using the --packages argument:

spark-shell --package com.granturing:spark-power-bi_2.10:1.5.0_0.0.7

Building From Source

The library uses SBT and can be built by running sbt package.

spark-power-bi's People

Contributors

granturing avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.