Git Product home page Git Product logo

gpuenabler's Introduction

GPU Enabler for Spark

This package brings GPU related capabilities to Spark Framework. The following capabilities are provided by this package,

  • load & initialize a user provided GPU kernel on executors which has Nvidia GPU cards attached to it.
  • convert the data from partitions to a columnar format so that it can be easily fed into the GPU kernel.
  • provide support for caching inside GPU for optimized performance.

Requirements

This package is compatible with Spark 1.5+ and scala 2.10

Spark Version Scala Version Compatible version of Spark GPU
1.5+ 2.10 1.0.0

Linking

You can link against this library (for Spark 1.5+) in your program at the following coordinates:

Using SBT:

libraryDependencies += "com.ibm" %% "gpu-enabler_2.10" % "1.0.0"

Using Maven:

<dependency>
    <groupId>com.ibm</groupId>
    <artifactId>gpu-enabler_2.10</artifactId>
    <version>1.0.0</version>
</dependency>

This library can also be added to Spark jobs launched through spark-shell or spark-submit by using the --packages command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --packages com.ibm:gpu-enabler_2.10:1.0.0

Unlike using --jars, using --packages ensures that this library and its dependencies will be added to the classpath. The --packages argument can also be used with bin/spark-submit.

Support for GPU Enabler package

  • Support x86_64 and ppc64le
  • Support OpenJDK and IBM JDK
  • Support NVIDIA GPU with CUDA (we confirmed with CUDA 7.0)
  • Support CUDA 7.0 and 7.5 (should work with CUDA 6.0 and 6.5)
  • Support scalar variables in primitive scalar types and primitive array in RDD

Examples

The recommended way to load and use GPU kernel is by using the following APIs, which are available in Scala.

The package comes with a set of examples. They can be tried out as follows, ./bin/run-example GpuEnablerExample

The Nvidia kernel used in these sample programs is available for download here. The source for this kernel can be found here.

Scala API

// import needed for the Spark GPU method to be added
import com.ibm.gpuenabler.CUDARDDImplicits._
import com.ibm.gpuenabler.CUDAFunction

// Load a kernel function from the GPU kernel binary 
val ptxURL = SparkGPULR.getClass.getResource("/GpuEnablerExamples.ptx")

val mapFunction = new CUDAFunction(
        "multiplyBy2",      // Native GPU function to multiple a given no. by 2 and return the result
        Array("this"),      // Input arguments 
        Array("this"),      // Output arguments 
        ptxURL)
        
val reduceFunction = new CUDAFunction(
        "sum",                  // Native GPU function to sum the input argument and return the result
        Array("this"),          // Input arguments 
        Array("this"),          // Output arguments
        ptxURL)
        
// 1. Apply a transformation ( multiple all the values of the RDD by 2)
//    (Note: Conversion of row based formatting to columnar format which is understandable
//           by GPU is done internally )
// 2. Trigger a reduction action (sum up all the values and return the result)
val output = sc.parallelize(1 to n, 1)
        .mapExtFunc((x: Int) => 2 * x, mapFunction)  
        .reduceExtFunc((x: Int, y: Int) => x + y, reduceFunction)  

Java API

// import needed for the Spark GPU method to be added
import com.ibm.gpuenabler.*;

// Load a kernel function from the GPU kernel binary 
URL ptxURL = gp.getClass().getResource("/GpuEnablerExamples.ptx");

// Register the cuda functions along with input & output arguments order
JavaCUDAFunction mapFunction = new JavaCUDAFunction(
                "multiplyBy2",
                Arrays.asList("this"),
                Arrays.asList("this"),
                ptxURL);

        
JavaCUDAFunction reduceFunction = new JavaCUDAFunction(
                "sum",
                Arrays.asList("this"),
                Arrays.asList("this"),
                ptxURL);
    
// Create a Java Cuda RDD 
JavaRDD<Integer> inputData = sc.parallelize(range, 10).cache();
ClassTag<Integer> tag = scala.reflect.ClassTag$.MODULE$.apply(Integer.TYPE);
JavaCUDARDD<Integer> jCRDD = new JavaCUDARDD(inputData.rdd(), tag);

// 1. Apply a transformation ( multiple all the values of the RDD by 2)
//    (Note: Conversion of row based formatting to columnar format which is understandable
//           by GPU is done internally )
// 2. Trigger a reduction action (sum up all the values and return the result)
Integer output = jCRDD.mapExtFunc((new Function<Integer, Integer>() {
            public Integer call(Integer x) { return (2 * x); }
        }), mapFunction, tag).cacheGpu().reduceExtFunc((new Function2<Integer, Integer, Integer>() {
            public Integer call(Integer integer, Integer integer2) {
                return integer + integer2;
            }
        }), reduceFunction);

Building From Source

Pre-requisites

  • NVidia GPU card with CUDA support of 7.0+.
  • Install CUDA drivers & Runtime drivers for your platform from here.

This library is built with Maven.

To build a JAR file please follow these steps,

  • git clone https://github.com/IBMSparkGPU/GPUEnabler.git
  • cd GPUEnabler
  • ./compile.sh

Note:

  • If mvn is not available in $PATH, export MVN_CMD="<path_to_mvn_binary>"
  • If you want to use mvn from spark/build directory, add "--force" argument to ./compile.sh

Testing

To run the tests, you should run mvn test.

On-going work

  • Leverage existing schema awareness in DataFrame/DataSet
  • Provide new DataFrame/DataSet operators to call CUDA Kernels

gpuenabler's People

Contributors

josiahsams avatar kiszk avatar kmadhugit avatar oshixiaoxiliu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.